Re: Changed the behavior of DataSet.print()

2015-06-04 Thread Robert Metzger
Resolved in https://issues.apache.org/jira/browse/FLINK-2070.

I'll update the documentation.

On Thu, Jun 4, 2015 at 12:22 AM, Stephan Ewen se...@apache.org wrote:

 I'll prepare a fix...

 On Wed, Jun 3, 2015 at 10:24 PM, Stephan Ewen se...@apache.org wrote:

  +1 for printOnTaskManager(prefix)
 
  +1 for deprecating the print(prefix) method.
 
  On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek aljos...@apache.org
  wrote:
 
  By the way, we also should rename the corresponding Streaming API
  method accordingly.
 
  On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels m...@apache.org
  wrote:
   +1 for printOnTaskManager(prefix)
  
   On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas ktzou...@apache.org
  wrote:
  
   +1 for printOnTaskManager(prefix)
  
   On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org
   wrote:
  
+1 for printOnTaskManager(prefix)
   
On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com
   wrote:
   
 +1 for writeToWorkerStdOut(prefix)
 On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org
  wrote:

  +1 for printOnTaskManager(prefix)
 
  On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger 
  rmetz...@apache.org
   
  wrote:
   I would like to reach consensus on this before the 0.9
 release.
  
   So far we have the following ideas:
  
   writeToWorkerStdOut(prefix)
   printOnTaskManager(prefix) (+1)
   logOnTaskManager(prefix)
  
   I'm against logOnTM because we are not logging the output, we
  are
 writing
   or printing it.
  
  
   *I would vote for deprecating print(prefix) and adding
   writeToWorkerStdOut(prefix)*
  
  
  
   On Thu, May 28, 2015 at 5:00 PM, Chiwan Park 
   chiwanp...@icloud.com
  wrote:
  
   I agree that avoiding name which starts with “print” is
  better.
  
   Regards,
   Chiwan Park
  
On May 28, 2015, at 11:35 PM, Maximilian Michels 
   m...@apache.org
  wrote:
   
+1 for printOnTaskManager()
   
On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian 
   sebastian.kr...@hpi.de
wrote:
   
Thanks, for your quick responses!
   
I also think that renaming the old print method should do
  the
 trick.
  As
   a
contribution to your brainstorming for a name, I propose
   logOnTaskManager()
;)
   
Cheers,
Sebastian
   
-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Donnerstag, 28. Mai 2015 14:34
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()
   
As I said, the common print prefix might indicate eager
execution.
   
I know that writeToTaskManagerStdOut() is quite bulky,
 but
  we
 should
   make
the difference in the behavior very clear, IMO.
   
2015-05-28 14:29 GMT+02:00 Stephan Ewen 
 se...@apache.org
  :
   
Actually, there is a method print(String prefix) which
  still
 goes
  to
the sysout of where the job is executed.
   
Let's give that one the name printOnTaskManager() and
  then
   we
  should
have it...
   
On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske 
fhue...@gmail.com
 
wrote:
   
I would avoid to call it printXYZ, since print()'s
  behavior
 changed
to eager execution.
   
2015-05-28 14:10 GMT+02:00 Robert Metzger 
   rmetz...@apache.org
:
   
Okay, you are right, local is actually confusing.
I'm against introducing worker as a term in the API.
  Its
still
called TaskManager. Maybe printOnTaskManager() ?
   
On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske 
 fhue...@gmail.com
  
wrote:
   
+1 for both.
   
printLocal() might not be the best name, because
  local is
not
well defined and could also be understood as the
 local
machine
of the
user.
How about naming the method completely different
(writeToWorkerStdOut()?)
to make sure users are not confused with eager and
 lazy
  execution?
   
   
2015-05-28 13:44 GMT+02:00 Robert Metzger 
rmetz...@apache.org
 :
   
Hi Sebastian,
   
thank you for the feedback. I agree that both
 variants
   have
a
right
to
exist.
   
I would vote for adding another method to the
 DataSet
   called
printLocal()
that has the old behavior.
   
On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
sebastian.kr...@hpi.de
wrote:
   
Hi everyone,
   
I am a bit worried about that recent change of the
   print()
method.
I
can
understand

Re: Changed the behavior of DataSet.print()

2015-06-03 Thread Stephan Ewen
+1 for printOnTaskManager(prefix)

+1 for deprecating the print(prefix) method.

On Tue, Jun 2, 2015 at 5:24 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 By the way, we also should rename the corresponding Streaming API
 method accordingly.

 On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels m...@apache.org wrote:
  +1 for printOnTaskManager(prefix)
 
  On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas ktzou...@apache.org
 wrote:
 
  +1 for printOnTaskManager(prefix)
 
  On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org
  wrote:
 
   +1 for printOnTaskManager(prefix)
  
   On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com
  wrote:
  
+1 for writeToWorkerStdOut(prefix)
On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org
 wrote:
   
 +1 for printOnTaskManager(prefix)

 On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger 
 rmetz...@apache.org
  
 wrote:
  I would like to reach consensus on this before the 0.9 release.
 
  So far we have the following ideas:
 
  writeToWorkerStdOut(prefix)
  printOnTaskManager(prefix) (+1)
  logOnTaskManager(prefix)
 
  I'm against logOnTM because we are not logging the output, we
 are
writing
  or printing it.
 
 
  *I would vote for deprecating print(prefix) and adding
  writeToWorkerStdOut(prefix)*
 
 
 
  On Thu, May 28, 2015 at 5:00 PM, Chiwan Park 
  chiwanp...@icloud.com
 wrote:
 
  I agree that avoiding name which starts with “print” is better.
 
  Regards,
  Chiwan Park
 
   On May 28, 2015, at 11:35 PM, Maximilian Michels 
  m...@apache.org
 wrote:
  
   +1 for printOnTaskManager()
  
   On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian 
  sebastian.kr...@hpi.de
   wrote:
  
   Thanks, for your quick responses!
  
   I also think that renaming the old print method should do
 the
trick.
 As
  a
   contribution to your brainstorming for a name, I propose
  logOnTaskManager()
   ;)
  
   Cheers,
   Sebastian
  
   -Original Message-
   From: Fabian Hueske [mailto:fhue...@gmail.com]
   Sent: Donnerstag, 28. Mai 2015 14:34
   To: dev@flink.apache.org
   Subject: Re: Changed the behavior of DataSet.print()
  
   As I said, the common print prefix might indicate eager
   execution.
  
   I know that writeToTaskManagerStdOut() is quite bulky, but
 we
should
  make
   the difference in the behavior very clear, IMO.
  
   2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:
  
   Actually, there is a method print(String prefix) which
 still
goes
 to
   the sysout of where the job is executed.
  
   Let's give that one the name printOnTaskManager() and
 then
  we
 should
   have it...
  
   On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske 
   fhue...@gmail.com

   wrote:
  
   I would avoid to call it printXYZ, since print()'s
 behavior
changed
   to eager execution.
  
   2015-05-28 14:10 GMT+02:00 Robert Metzger 
  rmetz...@apache.org
   :
  
   Okay, you are right, local is actually confusing.
   I'm against introducing worker as a term in the API.
 Its
   still
   called TaskManager. Maybe printOnTaskManager() ?
  
   On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske 
fhue...@gmail.com
 
   wrote:
  
   +1 for both.
  
   printLocal() might not be the best name, because
 local is
   not
   well defined and could also be understood as the local
   machine
   of the
   user.
   How about naming the method completely different
   (writeToWorkerStdOut()?)
   to make sure users are not confused with eager and lazy
 execution?
  
  
   2015-05-28 13:44 GMT+02:00 Robert Metzger 
   rmetz...@apache.org
:
  
   Hi Sebastian,
  
   thank you for the feedback. I agree that both variants
  have
   a
   right
   to
   exist.
  
   I would vote for adding another method to the DataSet
  called
   printLocal()
   that has the old behavior.
  
   On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
   sebastian.kr...@hpi.de
   wrote:
  
   Hi everyone,
  
   I am a bit worried about that recent change of the
  print()
   method.
   I
   can
   understand the rationale that obtaining the stdout
 from
  all
   the taskmanagers is cumbersome (although, for local
   debugging the old
   print()
   was fine).
   However, a major problem, I see with the new print(),
 is,
   that
   now
   you
   can
   only have one print() per plan, as the plan is
 directly
   executed
   as
   soon
   as
   print() is invoked. If you regard print() as a
 debugging
   means

Re: Changed the behavior of DataSet.print()

2015-06-02 Thread Kostas Tzoumas
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann trohrm...@apache.org wrote:

 +1 for printOnTaskManager(prefix)

 On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote:

  +1 for writeToWorkerStdOut(prefix)
  On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote:
 
   +1 for printOnTaskManager(prefix)
  
   On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org
   wrote:
I would like to reach consensus on this before the 0.9 release.
   
So far we have the following ideas:
   
writeToWorkerStdOut(prefix)
printOnTaskManager(prefix) (+1)
logOnTaskManager(prefix)
   
I'm against logOnTM because we are not logging the output, we are
  writing
or printing it.
   
   
*I would vote for deprecating print(prefix) and adding
writeToWorkerStdOut(prefix)*
   
   
   
On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com
   wrote:
   
I agree that avoiding name which starts with “print” is better.
   
Regards,
Chiwan Park
   
 On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org
   wrote:

 +1 for printOnTaskManager()

 On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian 
sebastian.kr...@hpi.de
 wrote:

 Thanks, for your quick responses!

 I also think that renaming the old print method should do the
  trick.
   As
a
 contribution to your brainstorming for a name, I propose
logOnTaskManager()
 ;)

 Cheers,
 Sebastian

 -Original Message-
 From: Fabian Hueske [mailto:fhue...@gmail.com]
 Sent: Donnerstag, 28. Mai 2015 14:34
 To: dev@flink.apache.org
 Subject: Re: Changed the behavior of DataSet.print()

 As I said, the common print prefix might indicate eager
 execution.

 I know that writeToTaskManagerStdOut() is quite bulky, but we
  should
make
 the difference in the behavior very clear, IMO.

 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:

 Actually, there is a method print(String prefix) which still
  goes
   to
 the sysout of where the job is executed.

 Let's give that one the name printOnTaskManager() and then we
   should
 have it...

 On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske 
 fhue...@gmail.com
  
 wrote:

 I would avoid to call it printXYZ, since print()'s behavior
  changed
 to eager execution.

 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org
 :

 Okay, you are right, local is actually confusing.
 I'm against introducing worker as a term in the API. Its
 still
 called TaskManager. Maybe printOnTaskManager() ?

 On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske 
  fhue...@gmail.com
   
 wrote:

 +1 for both.

 printLocal() might not be the best name, because local is
 not
 well defined and could also be understood as the local
 machine
 of the
 user.
 How about naming the method completely different
 (writeToWorkerStdOut()?)
 to make sure users are not confused with eager and lazy
   execution?


 2015-05-28 13:44 GMT+02:00 Robert Metzger 
 rmetz...@apache.org
  :

 Hi Sebastian,

 thank you for the feedback. I agree that both variants have
 a
 right
 to
 exist.

 I would vote for adding another method to the DataSet called
 printLocal()
 that has the old behavior.

 On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
 sebastian.kr...@hpi.de
 wrote:

 Hi everyone,

 I am a bit worried about that recent change of the print()
 method.
 I
 can
 understand the rationale that obtaining the stdout from all
 the taskmanagers is cumbersome (although, for local
 debugging the old
 print()
 was fine).
 However, a major problem, I see with the new print(), is,
 that
 now
 you
 can
 only have one print() per plan, as the plan is directly
 executed
 as
 soon
 as
 print() is invoked. If you regard print() as a debugging
 means,
 this
 is a
 severe restriction.
 I see use cases for both print() implementations, but I
 would at
 least
 provide some kind of backwards compatibility, be at a
 parameter
 or
 a
 legacyPrint() method or anything else. As I assume print()
 to be
 very
 frequently used, a lot of existing programs would benefit
 from
 this
 and
 might otherwise not be directly portable to newer Flink
 versions.
 What
 do
 you think?

 Cheers,
 Sebastian

 -Original Message-
 From: Robert Metzger [mailto:rmetz...@apache.org]
 Sent: Dienstag, 26. Mai 2015 11:12
 To: dev@flink.apache.org
 Subject: Re: Changed the behavior of DataSet.print()

 I've filed a JIRA to update the documentation

Re: Changed the behavior of DataSet.print()

2015-06-02 Thread Till Rohrmann
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske fhue...@gmail.com wrote:

 +1 for writeToWorkerStdOut(prefix)
 On Jun 2, 2015 11:42, Aljoscha Krettek aljos...@apache.org wrote:

  +1 for printOnTaskManager(prefix)
 
  On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger rmetz...@apache.org
  wrote:
   I would like to reach consensus on this before the 0.9 release.
  
   So far we have the following ideas:
  
   writeToWorkerStdOut(prefix)
   printOnTaskManager(prefix) (+1)
   logOnTaskManager(prefix)
  
   I'm against logOnTM because we are not logging the output, we are
 writing
   or printing it.
  
  
   *I would vote for deprecating print(prefix) and adding
   writeToWorkerStdOut(prefix)*
  
  
  
   On Thu, May 28, 2015 at 5:00 PM, Chiwan Park chiwanp...@icloud.com
  wrote:
  
   I agree that avoiding name which starts with “print” is better.
  
   Regards,
   Chiwan Park
  
On May 28, 2015, at 11:35 PM, Maximilian Michels m...@apache.org
  wrote:
   
+1 for printOnTaskManager()
   
On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian 
   sebastian.kr...@hpi.de
wrote:
   
Thanks, for your quick responses!
   
I also think that renaming the old print method should do the
 trick.
  As
   a
contribution to your brainstorming for a name, I propose
   logOnTaskManager()
;)
   
Cheers,
Sebastian
   
-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Donnerstag, 28. Mai 2015 14:34
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()
   
As I said, the common print prefix might indicate eager execution.
   
I know that writeToTaskManagerStdOut() is quite bulky, but we
 should
   make
the difference in the behavior very clear, IMO.
   
2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:
   
Actually, there is a method print(String prefix) which still
 goes
  to
the sysout of where the job is executed.
   
Let's give that one the name printOnTaskManager() and then we
  should
have it...
   
On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com
 
wrote:
   
I would avoid to call it printXYZ, since print()'s behavior
 changed
to eager execution.
   
2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:
   
Okay, you are right, local is actually confusing.
I'm against introducing worker as a term in the API. Its still
called TaskManager. Maybe printOnTaskManager() ?
   
On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske 
 fhue...@gmail.com
  
wrote:
   
+1 for both.
   
printLocal() might not be the best name, because local is not
well defined and could also be understood as the local machine
of the
user.
How about naming the method completely different
(writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy
  execution?
   
   
2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org
 :
   
Hi Sebastian,
   
thank you for the feedback. I agree that both variants have a
right
to
exist.
   
I would vote for adding another method to the DataSet called
printLocal()
that has the old behavior.
   
On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
sebastian.kr...@hpi.de
wrote:
   
Hi everyone,
   
I am a bit worried about that recent change of the print()
method.
I
can
understand the rationale that obtaining the stdout from all
the taskmanagers is cumbersome (although, for local
debugging the old
print()
was fine).
However, a major problem, I see with the new print(), is,
that
now
you
can
only have one print() per plan, as the plan is directly
executed
as
soon
as
print() is invoked. If you regard print() as a debugging
means,
this
is a
severe restriction.
I see use cases for both print() implementations, but I
would at
least
provide some kind of backwards compatibility, be at a
parameter
or
a
legacyPrint() method or anything else. As I assume print()
to be
very
frequently used, a lot of existing programs would benefit
from
this
and
might otherwise not be directly portable to newer Flink
versions.
What
do
you think?
   
Cheers,
Sebastian
   
-Original Message-
From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: Dienstag, 26. Mai 2015 11:12
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()
   
I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092
   
On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
se...@apache.org
   
wrote:
   
Hi all!
   
Me merged a patch yesterday that changed the API behavior
of
the
DataSet.print() function.
   
print() now prints

RE: Changed the behavior of DataSet.print()

2015-05-28 Thread Kruse, Sebastian
Hi everyone,

I am a bit worried about that recent change of the print() method. I can 
understand the rationale that obtaining the stdout from all the taskmanagers is 
cumbersome (although, for local debugging the old print() was fine). 
However, a major problem, I see with the new print(), is, that now you can only 
have one print() per plan, as the plan is directly executed as soon as print() 
is invoked. If you regard print() as a debugging means, this is a severe 
restriction.
I see use cases for both print() implementations, but I would at least provide 
some kind of backwards compatibility, be at a parameter or a legacyPrint() 
method or anything else. As I assume print() to be very frequently used, a lot 
of existing programs would benefit from this and might otherwise not be 
directly portable to newer Flink versions. What do you think?

Cheers,
Sebastian 

-Original Message-
From: Robert Metzger [mailto:rmetz...@apache.org] 
Sent: Dienstag, 26. Mai 2015 11:12
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()

I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote:

 Hi all!

 Me merged a patch yesterday that changed the API behavior of the 
 DataSet.print() function.

 print() now prints to stdout on the client process, rather than the 
 TaskManager process, as before. This is much nicer for debugging and 
 exploring data sets.

 One implication of this is that print() is now an eager method ( like
 collect() or count() ). That means that calling print() immediately 
 triggers the execution, and no env.execute() is required any more.

 Greetings,
 Stephan




Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Robert Metzger
Hi Sebastian,

thank you for the feedback. I agree that both variants have a right to
exist.

I would vote for adding another method to the DataSet called printLocal()
that has the old behavior.

On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de
wrote:

 Hi everyone,

 I am a bit worried about that recent change of the print() method. I can
 understand the rationale that obtaining the stdout from all the
 taskmanagers is cumbersome (although, for local debugging the old print()
 was fine).
 However, a major problem, I see with the new print(), is, that now you can
 only have one print() per plan, as the plan is directly executed as soon as
 print() is invoked. If you regard print() as a debugging means, this is a
 severe restriction.
 I see use cases for both print() implementations, but I would at least
 provide some kind of backwards compatibility, be at a parameter or a
 legacyPrint() method or anything else. As I assume print() to be very
 frequently used, a lot of existing programs would benefit from this and
 might otherwise not be directly portable to newer Flink versions. What do
 you think?

 Cheers,
 Sebastian

 -Original Message-
 From: Robert Metzger [mailto:rmetz...@apache.org]
 Sent: Dienstag, 26. Mai 2015 11:12
 To: dev@flink.apache.org
 Subject: Re: Changed the behavior of DataSet.print()

 I've filed a JIRA to update the documentation:
 https://issues.apache.org/jira/browse/FLINK-2092

 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote:

  Hi all!
 
  Me merged a patch yesterday that changed the API behavior of the
  DataSet.print() function.
 
  print() now prints to stdout on the client process, rather than the
  TaskManager process, as before. This is much nicer for debugging and
  exploring data sets.
 
  One implication of this is that print() is now an eager method ( like
  collect() or count() ). That means that calling print() immediately
  triggers the execution, and no env.execute() is required any more.
 
  Greetings,
  Stephan
 
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Robert Metzger
Okay, you are right, local is actually confusing.
I'm against introducing worker as a term in the API. Its still called
TaskManager. Maybe printOnTaskManager() ?

On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote:

 +1 for both.

 printLocal() might not be the best name, because local is not well
 defined and could also be understood as the local machine of the user.
 How about naming the method completely different (writeToWorkerStdOut()?)
 to make sure users are not confused with eager and lazy execution?


 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:

  Hi Sebastian,
 
  thank you for the feedback. I agree that both variants have a right to
  exist.
 
  I would vote for adding another method to the DataSet called
 printLocal()
  that has the old behavior.
 
  On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
 sebastian.kr...@hpi.de
  wrote:
 
   Hi everyone,
  
   I am a bit worried about that recent change of the print() method. I
 can
   understand the rationale that obtaining the stdout from all the
   taskmanagers is cumbersome (although, for local debugging the old
 print()
   was fine).
   However, a major problem, I see with the new print(), is, that now you
  can
   only have one print() per plan, as the plan is directly executed as
 soon
  as
   print() is invoked. If you regard print() as a debugging means, this
 is a
   severe restriction.
   I see use cases for both print() implementations, but I would at least
   provide some kind of backwards compatibility, be at a parameter or a
   legacyPrint() method or anything else. As I assume print() to be very
   frequently used, a lot of existing programs would benefit from this and
   might otherwise not be directly portable to newer Flink versions. What
 do
   you think?
  
   Cheers,
   Sebastian
  
   -Original Message-
   From: Robert Metzger [mailto:rmetz...@apache.org]
   Sent: Dienstag, 26. Mai 2015 11:12
   To: dev@flink.apache.org
   Subject: Re: Changed the behavior of DataSet.print()
  
   I've filed a JIRA to update the documentation:
   https://issues.apache.org/jira/browse/FLINK-2092
  
   On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org
 wrote:
  
Hi all!
   
Me merged a patch yesterday that changed the API behavior of the
DataSet.print() function.
   
print() now prints to stdout on the client process, rather than the
TaskManager process, as before. This is much nicer for debugging and
exploring data sets.
   
One implication of this is that print() is now an eager method ( like
collect() or count() ). That means that calling print() immediately
triggers the execution, and no env.execute() is required any more.
   
Greetings,
Stephan
   
   
  
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Stephan Ewen
Actually, there is a method print(String prefix) which still goes to the
sysout of where the job is executed.

Let's give that one the name printOnTaskManager() and then we should have
it...

On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote:

 I would avoid to call it printXYZ, since print()'s behavior changed to
 eager execution.

 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:

  Okay, you are right, local is actually confusing.
  I'm against introducing worker as a term in the API. Its still called
  TaskManager. Maybe printOnTaskManager() ?
 
  On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
 wrote:
 
   +1 for both.
  
   printLocal() might not be the best name, because local is not well
   defined and could also be understood as the local machine of the user.
   How about naming the method completely different
 (writeToWorkerStdOut()?)
   to make sure users are not confused with eager and lazy execution?
  
  
   2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:
  
Hi Sebastian,
   
thank you for the feedback. I agree that both variants have a right
 to
exist.
   
I would vote for adding another method to the DataSet called
   printLocal()
that has the old behavior.
   
On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
   sebastian.kr...@hpi.de
wrote:
   
 Hi everyone,

 I am a bit worried about that recent change of the print() method.
 I
   can
 understand the rationale that obtaining the stdout from all the
 taskmanagers is cumbersome (although, for local debugging the old
   print()
 was fine).
 However, a major problem, I see with the new print(), is, that now
  you
can
 only have one print() per plan, as the plan is directly executed as
   soon
as
 print() is invoked. If you regard print() as a debugging means,
 this
   is a
 severe restriction.
 I see use cases for both print() implementations, but I would at
  least
 provide some kind of backwards compatibility, be at a parameter or
 a
 legacyPrint() method or anything else. As I assume print() to be
 very
 frequently used, a lot of existing programs would benefit from this
  and
 might otherwise not be directly portable to newer Flink versions.
  What
   do
 you think?

 Cheers,
 Sebastian

 -Original Message-
 From: Robert Metzger [mailto:rmetz...@apache.org]
 Sent: Dienstag, 26. Mai 2015 11:12
 To: dev@flink.apache.org
 Subject: Re: Changed the behavior of DataSet.print()

 I've filed a JIRA to update the documentation:
 https://issues.apache.org/jira/browse/FLINK-2092

 On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org
   wrote:

  Hi all!
 
  Me merged a patch yesterday that changed the API behavior of the
  DataSet.print() function.
 
  print() now prints to stdout on the client process, rather than
  the
  TaskManager process, as before. This is much nicer for debugging
  and
  exploring data sets.
 
  One implication of this is that print() is now an eager method (
  like
  collect() or count() ). That means that calling print()
  immediately
  triggers the execution, and no env.execute() is required any
  more.
 
  Greetings,
  Stephan
 
 

   
  
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
+1 for both.

printLocal() might not be the best name, because local is not well
defined and could also be understood as the local machine of the user.
How about naming the method completely different (writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy execution?


2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:

 Hi Sebastian,

 thank you for the feedback. I agree that both variants have a right to
 exist.

 I would vote for adding another method to the DataSet called printLocal()
 that has the old behavior.

 On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de
 wrote:

  Hi everyone,
 
  I am a bit worried about that recent change of the print() method. I can
  understand the rationale that obtaining the stdout from all the
  taskmanagers is cumbersome (although, for local debugging the old print()
  was fine).
  However, a major problem, I see with the new print(), is, that now you
 can
  only have one print() per plan, as the plan is directly executed as soon
 as
  print() is invoked. If you regard print() as a debugging means, this is a
  severe restriction.
  I see use cases for both print() implementations, but I would at least
  provide some kind of backwards compatibility, be at a parameter or a
  legacyPrint() method or anything else. As I assume print() to be very
  frequently used, a lot of existing programs would benefit from this and
  might otherwise not be directly portable to newer Flink versions. What do
  you think?
 
  Cheers,
  Sebastian
 
  -Original Message-
  From: Robert Metzger [mailto:rmetz...@apache.org]
  Sent: Dienstag, 26. Mai 2015 11:12
  To: dev@flink.apache.org
  Subject: Re: Changed the behavior of DataSet.print()
 
  I've filed a JIRA to update the documentation:
  https://issues.apache.org/jira/browse/FLINK-2092
 
  On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote:
 
   Hi all!
  
   Me merged a patch yesterday that changed the API behavior of the
   DataSet.print() function.
  
   print() now prints to stdout on the client process, rather than the
   TaskManager process, as before. This is much nicer for debugging and
   exploring data sets.
  
   One implication of this is that print() is now an eager method ( like
   collect() or count() ). That means that calling print() immediately
   triggers the execution, and no env.execute() is required any more.
  
   Greetings,
   Stephan
  
  
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
I would avoid to call it printXYZ, since print()'s behavior changed to
eager execution.

2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:

 Okay, you are right, local is actually confusing.
 I'm against introducing worker as a term in the API. Its still called
 TaskManager. Maybe printOnTaskManager() ?

 On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote:

  +1 for both.
 
  printLocal() might not be the best name, because local is not well
  defined and could also be understood as the local machine of the user.
  How about naming the method completely different (writeToWorkerStdOut()?)
  to make sure users are not confused with eager and lazy execution?
 
 
  2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:
 
   Hi Sebastian,
  
   thank you for the feedback. I agree that both variants have a right to
   exist.
  
   I would vote for adding another method to the DataSet called
  printLocal()
   that has the old behavior.
  
   On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
  sebastian.kr...@hpi.de
   wrote:
  
Hi everyone,
   
I am a bit worried about that recent change of the print() method. I
  can
understand the rationale that obtaining the stdout from all the
taskmanagers is cumbersome (although, for local debugging the old
  print()
was fine).
However, a major problem, I see with the new print(), is, that now
 you
   can
only have one print() per plan, as the plan is directly executed as
  soon
   as
print() is invoked. If you regard print() as a debugging means, this
  is a
severe restriction.
I see use cases for both print() implementations, but I would at
 least
provide some kind of backwards compatibility, be at a parameter or a
legacyPrint() method or anything else. As I assume print() to be very
frequently used, a lot of existing programs would benefit from this
 and
might otherwise not be directly portable to newer Flink versions.
 What
  do
you think?
   
Cheers,
Sebastian
   
-Original Message-
From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: Dienstag, 26. Mai 2015 11:12
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()
   
I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092
   
On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org
  wrote:
   
 Hi all!

 Me merged a patch yesterday that changed the API behavior of the
 DataSet.print() function.

 print() now prints to stdout on the client process, rather than
 the
 TaskManager process, as before. This is much nicer for debugging
 and
 exploring data sets.

 One implication of this is that print() is now an eager method (
 like
 collect() or count() ). That means that calling print()
 immediately
 triggers the execution, and no env.execute() is required any
 more.

 Greetings,
 Stephan


   
  
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make
the difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:

 Actually, there is a method print(String prefix) which still goes to the
 sysout of where the job is executed.

 Let's give that one the name printOnTaskManager() and then we should have
 it...

 On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote:

  I would avoid to call it printXYZ, since print()'s behavior changed to
  eager execution.
 
  2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:
 
   Okay, you are right, local is actually confusing.
   I'm against introducing worker as a term in the API. Its still called
   TaskManager. Maybe printOnTaskManager() ?
  
   On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
  wrote:
  
+1 for both.
   
printLocal() might not be the best name, because local is not well
defined and could also be understood as the local machine of the
 user.
How about naming the method completely different
  (writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy execution?
   
   
2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:
   
 Hi Sebastian,

 thank you for the feedback. I agree that both variants have a right
  to
 exist.

 I would vote for adding another method to the DataSet called
printLocal()
 that has the old behavior.

 On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
sebastian.kr...@hpi.de
 wrote:

  Hi everyone,
 
  I am a bit worried about that recent change of the print()
 method.
  I
can
  understand the rationale that obtaining the stdout from all the
  taskmanagers is cumbersome (although, for local debugging the old
print()
  was fine).
  However, a major problem, I see with the new print(), is, that
 now
   you
 can
  only have one print() per plan, as the plan is directly executed
 as
soon
 as
  print() is invoked. If you regard print() as a debugging means,
  this
is a
  severe restriction.
  I see use cases for both print() implementations, but I would at
   least
  provide some kind of backwards compatibility, be at a parameter
 or
  a
  legacyPrint() method or anything else. As I assume print() to be
  very
  frequently used, a lot of existing programs would benefit from
 this
   and
  might otherwise not be directly portable to newer Flink versions.
   What
do
  you think?
 
  Cheers,
  Sebastian
 
  -Original Message-
  From: Robert Metzger [mailto:rmetz...@apache.org]
  Sent: Dienstag, 26. Mai 2015 11:12
  To: dev@flink.apache.org
  Subject: Re: Changed the behavior of DataSet.print()
 
  I've filed a JIRA to update the documentation:
  https://issues.apache.org/jira/browse/FLINK-2092
 
  On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org
 
wrote:
 
   Hi all!
  
   Me merged a patch yesterday that changed the API behavior of
 the
   DataSet.print() function.
  
   print() now prints to stdout on the client process, rather
 than
   the
   TaskManager process, as before. This is much nicer for
 debugging
   and
   exploring data sets.
  
   One implication of this is that print() is now an eager method
 (
   like
   collect() or count() ). That means that calling print()
   immediately
   triggers the execution, and no env.execute() is required any
   more.
  
   Greetings,
   Stephan
  
  
 

   
  
 



RE: Changed the behavior of DataSet.print()

2015-05-28 Thread Kruse, Sebastian
Thanks, for your quick responses!

I also think that renaming the old print method should do the trick. As a 
contribution to your brainstorming for a name, I propose logOnTaskManager() ;)

Cheers,
Sebastian

-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Donnerstag, 28. Mai 2015 14:34
To: dev@flink.apache.org
Subject: Re: Changed the behavior of DataSet.print()

As I said, the common print prefix might indicate eager execution.

I know that writeToTaskManagerStdOut() is quite bulky, but we should make the 
difference in the behavior very clear, IMO.

2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:

 Actually, there is a method print(String prefix) which still goes to 
 the sysout of where the job is executed.

 Let's give that one the name printOnTaskManager() and then we should 
 have it...

 On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote:

  I would avoid to call it printXYZ, since print()'s behavior changed 
  to eager execution.
 
  2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:
 
   Okay, you are right, local is actually confusing.
   I'm against introducing worker as a term in the API. Its still 
   called TaskManager. Maybe printOnTaskManager() ?
  
   On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
  wrote:
  
+1 for both.
   
printLocal() might not be the best name, because local is not 
well defined and could also be understood as the local machine 
of the
 user.
How about naming the method completely different
  (writeToWorkerStdOut()?)
to make sure users are not confused with eager and lazy execution?
   
   
2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:
   
 Hi Sebastian,

 thank you for the feedback. I agree that both variants have a 
 right
  to
 exist.

 I would vote for adding another method to the DataSet called
printLocal()
 that has the old behavior.

 On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
sebastian.kr...@hpi.de
 wrote:

  Hi everyone,
 
  I am a bit worried about that recent change of the print()
 method.
  I
can
  understand the rationale that obtaining the stdout from all 
  the taskmanagers is cumbersome (although, for local 
  debugging the old
print()
  was fine).
  However, a major problem, I see with the new print(), is, 
  that
 now
   you
 can
  only have one print() per plan, as the plan is directly 
  executed
 as
soon
 as
  print() is invoked. If you regard print() as a debugging 
  means,
  this
is a
  severe restriction.
  I see use cases for both print() implementations, but I 
  would at
   least
  provide some kind of backwards compatibility, be at a 
  parameter
 or
  a
  legacyPrint() method or anything else. As I assume print() 
  to be
  very
  frequently used, a lot of existing programs would benefit 
  from
 this
   and
  might otherwise not be directly portable to newer Flink versions.
   What
do
  you think?
 
  Cheers,
  Sebastian
 
  -Original Message-
  From: Robert Metzger [mailto:rmetz...@apache.org]
  Sent: Dienstag, 26. Mai 2015 11:12
  To: dev@flink.apache.org
  Subject: Re: Changed the behavior of DataSet.print()
 
  I've filed a JIRA to update the documentation:
  https://issues.apache.org/jira/browse/FLINK-2092
 
  On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen 
  se...@apache.org
 
wrote:
 
   Hi all!
  
   Me merged a patch yesterday that changed the API behavior 
   of
 the
   DataSet.print() function.
  
   print() now prints to stdout on the client process, 
   rather
 than
   the
   TaskManager process, as before. This is much nicer for
 debugging
   and
   exploring data sets.
  
   One implication of this is that print() is now an eager 
   method
 (
   like
   collect() or count() ). That means that calling print()
   immediately
   triggers the execution, and no env.execute() is required 
   any
   more.
  
   Greetings,
   Stephan
  
  
 

   
  
 



Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Maximilian Michels
+1 for printOnTaskManager()

On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de
wrote:

 Thanks, for your quick responses!

 I also think that renaming the old print method should do the trick. As a
 contribution to your brainstorming for a name, I propose logOnTaskManager()
 ;)

 Cheers,
 Sebastian

 -Original Message-
 From: Fabian Hueske [mailto:fhue...@gmail.com]
 Sent: Donnerstag, 28. Mai 2015 14:34
 To: dev@flink.apache.org
 Subject: Re: Changed the behavior of DataSet.print()

 As I said, the common print prefix might indicate eager execution.

 I know that writeToTaskManagerStdOut() is quite bulky, but we should make
 the difference in the behavior very clear, IMO.

 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org:

  Actually, there is a method print(String prefix) which still goes to
  the sysout of where the job is executed.
 
  Let's give that one the name printOnTaskManager() and then we should
  have it...
 
  On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com
 wrote:
 
   I would avoid to call it printXYZ, since print()'s behavior changed
   to eager execution.
  
   2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org:
  
Okay, you are right, local is actually confusing.
I'm against introducing worker as a term in the API. Its still
called TaskManager. Maybe printOnTaskManager() ?
   
On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
   wrote:
   
 +1 for both.

 printLocal() might not be the best name, because local is not
 well defined and could also be understood as the local machine
 of the
  user.
 How about naming the method completely different
   (writeToWorkerStdOut()?)
 to make sure users are not confused with eager and lazy execution?


 2015-05-28 13:44 GMT+02:00 Robert Metzger rmetz...@apache.org:

  Hi Sebastian,
 
  thank you for the feedback. I agree that both variants have a
  right
   to
  exist.
 
  I would vote for adding another method to the DataSet called
 printLocal()
  that has the old behavior.
 
  On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian 
 sebastian.kr...@hpi.de
  wrote:
 
   Hi everyone,
  
   I am a bit worried about that recent change of the print()
  method.
   I
 can
   understand the rationale that obtaining the stdout from all
   the taskmanagers is cumbersome (although, for local
   debugging the old
 print()
   was fine).
   However, a major problem, I see with the new print(), is,
   that
  now
you
  can
   only have one print() per plan, as the plan is directly
   executed
  as
 soon
  as
   print() is invoked. If you regard print() as a debugging
   means,
   this
 is a
   severe restriction.
   I see use cases for both print() implementations, but I
   would at
least
   provide some kind of backwards compatibility, be at a
   parameter
  or
   a
   legacyPrint() method or anything else. As I assume print()
   to be
   very
   frequently used, a lot of existing programs would benefit
   from
  this
and
   might otherwise not be directly portable to newer Flink
 versions.
What
 do
   you think?
  
   Cheers,
   Sebastian
  
   -Original Message-
   From: Robert Metzger [mailto:rmetz...@apache.org]
   Sent: Dienstag, 26. Mai 2015 11:12
   To: dev@flink.apache.org
   Subject: Re: Changed the behavior of DataSet.print()
  
   I've filed a JIRA to update the documentation:
   https://issues.apache.org/jira/browse/FLINK-2092
  
   On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
   se...@apache.org
  
 wrote:
  
Hi all!
   
Me merged a patch yesterday that changed the API behavior
of
  the
DataSet.print() function.
   
print() now prints to stdout on the client process,
rather
  than
the
TaskManager process, as before. This is much nicer for
  debugging
and
exploring data sets.
   
One implication of this is that print() is now an eager
method
  (
like
collect() or count() ). That means that calling print()
immediately
triggers the execution, and no env.execute() is required
any
more.
   
Greetings,
Stephan
   
   
  
 

   
  
 



Re: Changed the behavior of DataSet.print()

2015-05-26 Thread Robert Metzger
I've filed a JIRA to update the documentation:
https://issues.apache.org/jira/browse/FLINK-2092

On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen se...@apache.org wrote:

 Hi all!

 Me merged a patch yesterday that changed the API behavior of the
 DataSet.print() function.

 print() now prints to stdout on the client process, rather than the
 TaskManager process, as before. This is much nicer for debugging and
 exploring data sets.

 One implication of this is that print() is now an eager method ( like
 collect() or count() ). That means that calling print() immediately
 triggers the execution, and no env.execute() is required any more.

 Greetings,
 Stephan