Re: About Operator and OperatorBase
I think Stephan meant Meteor, an old API when Flink was Stratosphere. This was never part of the code that made it to Apache. Not sure if we want to remove the common API, as it provides a dataflow abstraction that is higher level than the JobGraph. Admittedly, I don't have a better argument other than it might be useful some day and happy to change my opinion :-) Kostas On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks for the explanation, Stephan. I always wonder why the extra common APIs exist. Then I think this should be high priority if we want to remove the common API to reduce the unnecessary layer and dead code. As Ufuk mentioned before, better doing it now before more stuff build on top of Flink. So removing old Record API [1] and the tests depending on them is step one of the process, but what is JSON API? - Henry [1] https://issues.apache.org/jira/browse/FLINK-1681 On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote: Originally, we had multiple Apis with different data models: the current Java API, the record api, a JSON API. The common API was the data model agnostic set of operators on which they built. It has become redundant when we saw how well things can be built in top of the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all build on top of the Java API.
Re: About Operator and OperatorBase
I think the main difference to the Java API is right now that the JavaAPI is fluent and the common API is compositional. Otherwise there is not much in the common API any more. Not sure whether this warrants an extra API... On Wed, Apr 22, 2015 at 3:52 PM, Kostas Tzoumas ktzou...@apache.org wrote: I think Stephan meant Meteor, an old API when Flink was Stratosphere. This was never part of the code that made it to Apache. Not sure if we want to remove the common API, as it provides a dataflow abstraction that is higher level than the JobGraph. Admittedly, I don't have a better argument other than it might be useful some day and happy to change my opinion :-) Kostas On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra henry.sapu...@gmail.com wrote: Thanks for the explanation, Stephan. I always wonder why the extra common APIs exist. Then I think this should be high priority if we want to remove the common API to reduce the unnecessary layer and dead code. As Ufuk mentioned before, better doing it now before more stuff build on top of Flink. So removing old Record API [1] and the tests depending on them is step one of the process, but what is JSON API? - Henry [1] https://issues.apache.org/jira/browse/FLINK-1681 On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote: Originally, we had multiple Apis with different data models: the current Java API, the record api, a JSON API. The common API was the data model agnostic set of operators on which they built. It has become redundant when we saw how well things can be built in top of the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all build on top of the Java API.
Re: About Operator and OperatorBase
Originally, we had multiple Apis with different data models: the current Java API, the record api, a JSON API. The common API was the data model agnostic set of operators on which they built. It has become redundant when we saw how well things can be built in top of the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all build on top of the Java API.
Re: About Operator and OperatorBase
Thanks for the explanation, Stephan. I always wonder why the extra common APIs exist. Then I think this should be high priority if we want to remove the common API to reduce the unnecessary layer and dead code. As Ufuk mentioned before, better doing it now before more stuff build on top of Flink. So removing old Record API [1] and the tests depending on them is step one of the process, but what is JSON API? - Henry [1] https://issues.apache.org/jira/browse/FLINK-1681 On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen se...@apache.org wrote: Originally, we had multiple Apis with different data models: the current Java API, the record api, a JSON API. The common API was the data model agnostic set of operators on which they built. It has become redundant when we saw how well things can be built in top of the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all build on top of the Java API.
Re: About Operator and OperatorBase
Since we are talking about common API, what was the original intention or design for this layer? From the doc: The Common API operator exists only in order for the flink-java and flink-scalapackages to not have a dependency on the optimizer. Currently the Java API Operator is converted into common APIs via recursive call from ExecutionEnvironement#createProgramPlan which delegate it to OperatorTranslation. Just wanted to know more about why the original approach is taking this route. Thanks, - Henry On Fri, Apr 17, 2015 at 5:47 AM, Stephan Ewen se...@apache.org wrote: Okay, it seems the consensus forms around not breaking the API. When it comes to the *OperatorBase - should we rename them or simply get rid of them (remove the common API). If we want to remove them, a precondition is to remove the Record API, and for that, we should migrate the Record-API-based test cases. Stephan On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels m...@apache.org wrote: +1 for keeping the API. Even though this will not change your initial concern much, Aljoscha :) I agree with you that it would be more consistent to call the result of an operator OperatorDataSet. On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote: Renaming the core operators is fine with me, but I would not touch API facing classes. A big +1 for Timo's suggestion. 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org: I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX... pretty inconsistent. On 15.04.2015 17:48, Till Rohrmann wrote: I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Re: About Operator and OperatorBase
Okay, it seems the consensus forms around not breaking the API. When it comes to the *OperatorBase - should we rename them or simply get rid of them (remove the common API). If we want to remove them, a precondition is to remove the Record API, and for that, we should migrate the Record-API-based test cases. Stephan On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels m...@apache.org wrote: +1 for keeping the API. Even though this will not change your initial concern much, Aljoscha :) I agree with you that it would be more consistent to call the result of an operator OperatorDataSet. On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote: Renaming the core operators is fine with me, but I would not touch API facing classes. A big +1 for Timo's suggestion. 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org: I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX... pretty inconsistent. On 15.04.2015 17:48, Till Rohrmann wrote: I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Re: About Operator and OperatorBase
I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX... pretty inconsistent. On 15.04.2015 17:48, Till Rohrmann wrote: I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Re: About Operator and OperatorBase
+1 for keeping the API. Even though this will not change your initial concern much, Aljoscha :) I agree with you that it would be more consistent to call the result of an operator OperatorDataSet. On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske fhue...@gmail.com wrote: Renaming the core operators is fine with me, but I would not touch API facing classes. A big +1 for Timo's suggestion. 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org: I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX... pretty inconsistent. On 15.04.2015 17:48, Till Rohrmann wrote: I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Re: About Operator and OperatorBase
Renaming the core operators is fine with me, but I would not touch API facing classes. A big +1 for Timo's suggestion. 2015-04-16 6:30 GMT-05:00 Timo Walther twal...@apache.org: I share Stephans opinion. By the way, we could also find a common name for operators with two inputs. Sometimes it's TwoInputXXX, DualInputXXX, BinaryInputXXX... pretty inconsistent. On 15.04.2015 17:48, Till Rohrmann wrote: I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Re: About Operator and OperatorBase
On 15 Apr 2015, at 14:11, Aljoscha Krettek aljos...@apache.org wrote: This is just a personal annoyance and I think we are to advanced to change this now, but here goes: Could we rename the classes in the API, so that for example MapOperator becomes MapDataSet or MapSet, and the actual operators in the common package become MapOperator instead of MapOperatorBase. My thinking is that two things called MapOperator are very confusing. If we want to change this we should do it fast. +1 Realistically speaking, no one except the core committers are working on the internals currently. Better change it now (or never :D).
Re: About Operator and OperatorBase
I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Am 15.04.2015 14:11 schrieb Aljoscha Krettek aljos...@apache.org: This is just a personal annoyance and I think we are to advanced to change this now, but here goes: Could we rename the classes in the API, so that for example MapOperator becomes MapDataSet or MapSet, and the actual operators in the common package become MapOperator instead of MapOperatorBase. My thinking is that two things called MapOperator are very confusing. If we want to change this we should do it fast. Cheers, Aljoscha
Re: About Operator and OperatorBase
I would also be in favour of making the distinction between the API and common API layer more clear by using different names. This will ease the understanding of the source code. In the wake of a possible renaming we could also get rid of the legacy code org.apache.flink.optimizer.dag.MatchNode and rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to make the naming more consistent. On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi u...@apache.org wrote: On 15 Apr 2015, at 15:01, Stephan Ewen se...@apache.org wrote: I think we can rename the base operators. Renaming the subclass of DataSet would be extremely api breaking. I think that is not worth it. Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.