Re: About Operator and OperatorBase

2015-04-22 Thread Kostas Tzoumas
I think Stephan meant Meteor, an old API when Flink was Stratosphere. This
was never part of the code that made it to Apache.

Not sure if we want to remove the common API, as it provides a dataflow
abstraction that is higher level than the JobGraph. Admittedly, I don't
have a better argument other than it might be useful some day and happy
to change my opinion :-)

Kostas

On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra henry.sapu...@gmail.com
wrote:

 Thanks for the explanation, Stephan. I always wonder why the extra
 common APIs exist.

 Then I think this should be high priority if we want to remove the
 common API to reduce the unnecessary layer and dead code. As Ufuk
 mentioned before, better doing it now before more stuff build on top
 of Flink.

 So removing old Record API [1] and the tests depending on them is step
 one of the process, but what is JSON API?

 - Henry

 [1] https://issues.apache.org/jira/browse/FLINK-1681

 On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote:
  Originally, we had multiple Apis with different data models: the current
  Java API, the record api, a JSON API. The common API was the data model
  agnostic set of operators on which they built.
 
  It has become redundant when we saw how well things can be built in top
 of
  the Java API, using the TypeInformation. Now, Scala, Python, Dataflow,
 all
  build on top of the Java API.



Re: About Operator and OperatorBase

2015-04-22 Thread Stephan Ewen
I think the main difference to the Java API is right now that the JavaAPI
is fluent and the common API is compositional. Otherwise there is not much
in the common API any more.

Not sure whether this warrants an extra API...



On Wed, Apr 22, 2015 at 3:52 PM, Kostas Tzoumas ktzou...@apache.org wrote:

 I think Stephan meant Meteor, an old API when Flink was Stratosphere. This
 was never part of the code that made it to Apache.

 Not sure if we want to remove the common API, as it provides a dataflow
 abstraction that is higher level than the JobGraph. Admittedly, I don't
 have a better argument other than it might be useful some day and happy
 to change my opinion :-)

 Kostas

 On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra henry.sapu...@gmail.com
 wrote:

  Thanks for the explanation, Stephan. I always wonder why the extra
  common APIs exist.
 
  Then I think this should be high priority if we want to remove the
  common API to reduce the unnecessary layer and dead code. As Ufuk
  mentioned before, better doing it now before more stuff build on top
  of Flink.
 
  So removing old Record API [1] and the tests depending on them is step
  one of the process, but what is JSON API?
 
  - Henry
 
  [1] https://issues.apache.org/jira/browse/FLINK-1681
 
  On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote:
   Originally, we had multiple Apis with different data models: the
 current
   Java API, the record api, a JSON API. The common API was the data model
   agnostic set of operators on which they built.
  
   It has become redundant when we saw how well things can be built in top
  of
   the Java API, using the TypeInformation. Now, Scala, Python, Dataflow,
  all
   build on top of the Java API.
 



Re: About Operator and OperatorBase

2015-04-21 Thread Stephan Ewen
Originally, we had multiple Apis with different data models: the current
Java API, the record api, a JSON API. The common API was the data model
agnostic set of operators on which they built.

It has become redundant when we saw how well things can be built in top of
the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all
build on top of the Java API.


Re: About Operator and OperatorBase

2015-04-21 Thread Henry Saputra
Thanks for the explanation, Stephan. I always wonder why the extra
common APIs exist.

Then I think this should be high priority if we want to remove the
common API to reduce the unnecessary layer and dead code. As Ufuk
mentioned before, better doing it now before more stuff build on top
of Flink.

So removing old Record API [1] and the tests depending on them is step
one of the process, but what is JSON API?

- Henry

[1] https://issues.apache.org/jira/browse/FLINK-1681

On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote:
 Originally, we had multiple Apis with different data models: the current
 Java API, the record api, a JSON API. The common API was the data model
 agnostic set of operators on which they built.

 It has become redundant when we saw how well things can be built in top of
 the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all
 build on top of the Java API.


Re: About Operator and OperatorBase

2015-04-20 Thread Henry Saputra
Since we are talking about common API, what was the original intention
or design for this layer?

From the doc:

 The Common API operator exists only in order for the flink-java and
flink-scalapackages to not have a dependency on the optimizer.

Currently the Java API Operator is converted into common APIs via
recursive call from ExecutionEnvironement#createProgramPlan which
delegate it to OperatorTranslation.
Just wanted to know more about why the original approach is taking this route.

Thanks,

- Henry

On Fri, Apr 17, 2015 at 5:47 AM, Stephan Ewen se...@apache.org wrote:
 Okay, it seems the consensus forms around not breaking the API.

 When it comes to the *OperatorBase - should we rename them or simply get
 rid of them (remove the common API). If we want to remove them, a
 precondition is to remove the Record API, and for that, we should migrate
 the Record-API-based test cases.

 Stephan


 On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels m...@apache.org wrote:

 +1 for keeping the API. Even though this will not change your initial
 concern much, Aljoscha :) I agree with you that it would be more consistent
 to call the result of an operator OperatorDataSet.

 On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote:

  Renaming the core operators is fine with me, but I would not touch API
  facing classes.
  A big +1 for Timo's suggestion.
 
  2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org:
 
   I share Stephans opinion.
  
   By the way, we could also find a common name for operators with two
   inputs. Sometimes it's TwoInputXXX, DualInputXXX,
 BinaryInputXXX...
   pretty inconsistent.
  
  
   On 15.04.2015 17:48, Till Rohrmann wrote:
  
   I would also be in favour of making the distinction between the API
 and
   common API layer more clear by using different names. This will ease
 the
   understanding of the source code.
  
   In the wake of a possible renaming we could also get rid of the legacy
   code
   org.apache.flink.optimizer.dag.MatchNode and
   rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver
 to
   make the naming more consistent.
  
   On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:
  
On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:
  
I think we can rename the base operators.
  
   Renaming the subclass of DataSet would be extremely api breaking. I
   think
   that is not worth it.
  
   Oh, that's right. We return MapOperator for DataSet operations.
  Stephan's
   point makes sense.
  
  
  
 



Re: About Operator and OperatorBase

2015-04-17 Thread Stephan Ewen
Okay, it seems the consensus forms around not breaking the API.

When it comes to the *OperatorBase - should we rename them or simply get
rid of them (remove the common API). If we want to remove them, a
precondition is to remove the Record API, and for that, we should migrate
the Record-API-based test cases.

Stephan


On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels m...@apache.org wrote:

 +1 for keeping the API. Even though this will not change your initial
 concern much, Aljoscha :) I agree with you that it would be more consistent
 to call the result of an operator OperatorDataSet.

 On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote:

  Renaming the core operators is fine with me, but I would not touch API
  facing classes.
  A big +1 for Timo's suggestion.
 
  2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org:
 
   I share Stephans opinion.
  
   By the way, we could also find a common name for operators with two
   inputs. Sometimes it's TwoInputXXX, DualInputXXX,
 BinaryInputXXX...
   pretty inconsistent.
  
  
   On 15.04.2015 17:48, Till Rohrmann wrote:
  
   I would also be in favour of making the distinction between the API
 and
   common API layer more clear by using different names. This will ease
 the
   understanding of the source code.
  
   In the wake of a possible renaming we could also get rid of the legacy
   code
   org.apache.flink.optimizer.dag.MatchNode and
   rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver
 to
   make the naming more consistent.
  
   On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:
  
On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:
  
I think we can rename the base operators.
  
   Renaming the subclass of DataSet would be extremely api breaking. I
   think
   that is not worth it.
  
   Oh, that's right. We return MapOperator for DataSet operations.
  Stephan's
   point makes sense.
  
  
  
 



Re: About Operator and OperatorBase

2015-04-16 Thread Timo Walther

I share Stephans opinion.

By the way, we could also find a common name for operators with two 
inputs. Sometimes it's TwoInputXXX, DualInputXXX, 
BinaryInputXXX... pretty inconsistent.


On 15.04.2015 17:48, Till Rohrmann wrote:

I would also be in favour of making the distinction between the API and
common API layer more clear by using different names. This will ease the
understanding of the source code.

In the wake of a possible renaming we could also get rid of the legacy code
org.apache.flink.optimizer.dag.MatchNode and
rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
make the naming more consistent.

On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:


On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:


I think we can rename the base operators.

Renaming the subclass of DataSet would be extremely api breaking. I think
that is not worth it.

Oh, that's right. We return MapOperator for DataSet operations. Stephan's
point makes sense.




Re: About Operator and OperatorBase

2015-04-16 Thread Maximilian Michels
+1 for keeping the API. Even though this will not change your initial
concern much, Aljoscha :) I agree with you that it would be more consistent
to call the result of an operator OperatorDataSet.

On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote:

 Renaming the core operators is fine with me, but I would not touch API
 facing classes.
 A big +1 for Timo's suggestion.

 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org:

  I share Stephans opinion.
 
  By the way, we could also find a common name for operators with two
  inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX...
  pretty inconsistent.
 
 
  On 15.04.2015 17:48, Till Rohrmann wrote:
 
  I would also be in favour of making the distinction between the API and
  common API layer more clear by using different names. This will ease the
  understanding of the source code.
 
  In the wake of a possible renaming we could also get rid of the legacy
  code
  org.apache.flink.optimizer.dag.MatchNode and
  rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
  make the naming more consistent.
 
  On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:
 
   On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:
 
   I think we can rename the base operators.
 
  Renaming the subclass of DataSet would be extremely api breaking. I
  think
  that is not worth it.
 
  Oh, that's right. We return MapOperator for DataSet operations.
 Stephan's
  point makes sense.
 
 
 



Re: About Operator and OperatorBase

2015-04-16 Thread Fabian Hueske
Renaming the core operators is fine with me, but I would not touch API
facing classes.
A big +1 for Timo's suggestion.

2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org:

 I share Stephans opinion.

 By the way, we could also find a common name for operators with two
 inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX...
 pretty inconsistent.


 On 15.04.2015 17:48, Till Rohrmann wrote:

 I would also be in favour of making the distinction between the API and
 common API layer more clear by using different names. This will ease the
 understanding of the source code.

 In the wake of a possible renaming we could also get rid of the legacy
 code
 org.apache.flink.optimizer.dag.MatchNode and
 rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
 make the naming more consistent.

 On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:

  On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:

  I think we can rename the base operators.

 Renaming the subclass of DataSet would be extremely api breaking. I
 think
 that is not worth it.

 Oh, that's right. We return MapOperator for DataSet operations. Stephan's
 point makes sense.





Re: About Operator and OperatorBase

2015-04-15 Thread Ufuk Celebi

On 15 Apr 2015, at 14:11, Aljoscha Krettek aljos...@apache.org wrote:

 This is just a personal annoyance and I think we are to advanced to
 change this now, but here goes: Could we rename the classes in the
 API, so that for example MapOperator becomes MapDataSet or MapSet, and
 the actual operators in the common package become MapOperator instead
 of MapOperatorBase.
 
 My thinking is that two things called MapOperator are very confusing.
 If we want to change this we should do it fast.

+1

Realistically speaking, no one except the core committers are working on the 
internals currently. Better change it now (or never :D).

Re: About Operator and OperatorBase

2015-04-15 Thread Stephan Ewen
I think we can rename the base operators.

Renaming the subclass of DataSet would be extremely api breaking. I think
that is not worth it.
Am 15.04.2015 14:11 schrieb Aljoscha Krettek aljos...@apache.org:

 This is just a personal annoyance and I think we are to advanced to
 change this now, but here goes: Could we rename the classes in the
 API, so that for example MapOperator becomes MapDataSet or MapSet, and
 the actual operators in the common package become MapOperator instead
 of MapOperatorBase.

 My thinking is that two things called MapOperator are very confusing.
 If we want to change this we should do it fast.

 Cheers,
 Aljoscha



Re: About Operator and OperatorBase

2015-04-15 Thread Till Rohrmann
I would also be in favour of making the distinction between the API and
common API layer more clear by using different names. This will ease the
understanding of the source code.

In the wake of a possible renaming we could also get rid of the legacy code
org.apache.flink.optimizer.dag.MatchNode and
rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
make the naming more consistent.

On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote:


 On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote:

  I think we can rename the base operators.
 
  Renaming the subclass of DataSet would be extremely api breaking. I think
  that is not worth it.

 Oh, that's right. We return MapOperator for DataSet operations. Stephan's
 point makes sense.