Re: Changed the behavior of DataSet.print()
Resolved in https://issues.apache.org/jira/browse/FLINK-2070. I'll update the documentation. On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen se...@apache.org wrote: I'll prepare a fix... On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen se...@apache.org wrote: +1 for printOnTaskManager(prefix) +1 for deprecating the print(prefix) method. On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek aljos...@apache.org wrote: By the way, we also should rename the corresponding Streaming API method accordingly. On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas ktzou...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for writeToWorkerStdOut(prefix) On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org wrote: I would like to reach consensus on this before the 0.9 release. So far we have the following ideas: writeToWorkerStdOut(prefix) printOnTaskManager(prefix) (+1) logOnTaskManager(prefix) I'm against logOnTM because we are not logging the output, we are writing or printing it. *I would vote for deprecating print(prefix) and adding writeToWorkerStdOut(prefix)* On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com wrote: I agree that avoiding name which starts with “print” is better. Regards, Chiwan Park On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org : Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org : Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org : Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand
Re: Changed the behavior of DataSet.print()
+1 for printOnTaskManager(prefix) +1 for deprecating the print(prefix) method. On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek aljos...@apache.org wrote: By the way, we also should rename the corresponding Streaming API method accordingly. On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas ktzou...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for writeToWorkerStdOut(prefix) On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org wrote: I would like to reach consensus on this before the 0.9 release. So far we have the following ideas: writeToWorkerStdOut(prefix) printOnTaskManager(prefix) (+1) logOnTaskManager(prefix) I'm against logOnTM because we are not logging the output, we are writing or printing it. *I would vote for deprecating print(prefix) and adding writeToWorkerStdOut(prefix)* On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com wrote: I agree that avoiding name which starts with “print” is better. Regards, Chiwan Park On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org : Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org : Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means
Re: Changed the behavior of DataSet.print()
+1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for writeToWorkerStdOut(prefix) On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org wrote: I would like to reach consensus on this before the 0.9 release. So far we have the following ideas: writeToWorkerStdOut(prefix) printOnTaskManager(prefix) (+1) logOnTaskManager(prefix) I'm against logOnTM because we are not logging the output, we are writing or printing it. *I would vote for deprecating print(prefix) and adding writeToWorkerStdOut(prefix)* On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com wrote: I agree that avoiding name which starts with “print” is better. Regards, Chiwan Park On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org : Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org : Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation
Re: Changed the behavior of DataSet.print()
+1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for writeToWorkerStdOut(prefix) On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote: +1 for printOnTaskManager(prefix) On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org wrote: I would like to reach consensus on this before the 0.9 release. So far we have the following ideas: writeToWorkerStdOut(prefix) printOnTaskManager(prefix) (+1) logOnTaskManager(prefix) I'm against logOnTM because we are not logging the output, we are writing or printing it. *I would vote for deprecating print(prefix) and adding writeToWorkerStdOut(prefix)* On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com wrote: I agree that avoiding name which starts with “print” is better. Regards, Chiwan Park On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org wrote: +1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org : Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints
RE: Changed the behavior of DataSet.print()
Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
+1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
RE: Changed the behavior of DataSet.print()
Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
+1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnerstag, 28. Mai 2015 14:34 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org: Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, that now you can only have one print() per plan, as the plan is directly executed as soon as print() is invoked. If you regard print() as a debugging means, this is a severe restriction. I see use cases for both print() implementations, but I would at least provide some kind of backwards compatibility, be at a parameter or a legacyPrint() method or anything else. As I assume print() to be very frequently used, a lot of existing programs would benefit from this and might otherwise not be directly portable to newer Flink versions. What do you think? Cheers, Sebastian -Original Message- From: Robert Metzger [mailto:rmetz...@apache.org] Sent: Dienstag, 26. Mai 2015 11:12 To: dev@flink.apache.org Subject: Re: Changed the behavior of DataSet.print() I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan
Re: Changed the behavior of DataSet.print()
I've filed a JIRA to update the documentation: https://issues.apache.org/jira/browse/FLINK-2092 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote: Hi all! Me merged a patch yesterday that changed the API behavior of the DataSet.print() function. print() now prints to stdout on the client process, rather than the TaskManager process, as before. This is much nicer for debugging and exploring data sets. One implication of this is that print() is now an eager method ( like collect() or count() ). That means that calling print() immediately triggers the execution, and no env.execute() is required any more. Greetings, Stephan