[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333991#comment-17333991 ] Flink Jira Bot commented on FLINK-11818: This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available, stale-assigned > Time Spent: 10m > Remaining Estimate: 0h > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323344#comment-17323344 ] Flink Jira Bot commented on FLINK-11818: This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available, stale-assigned > Time Spent: 10m > Remaining Estimate: 0h > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792455#comment-16792455 ] vinoyang commented on FLINK-11818: -- Thanks for your [~fhueske] . Sounds good to me. I will start this feature and add the API to DataSetUtils. About the performance, I think we should not worry about it too much. Users should know this API would slow down the performance. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792451#comment-16792451 ] Fabian Hueske commented on FLINK-11818: --- I can see that such a function is valuable. However, I also think that starting external processes is performance sensitive and can also depend on the scheduling / availability of software. Hence, I would not make it a first-class API (i.e., add it to DataSetUtils). When the feature is stable, we can check if the function is popular enough to move it to DataSet. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789279#comment-16789279 ] vinoyang commented on FLINK-11818: -- It seems it's reasonable. It is also not a normal transformation function provided by traditional RDBMS. So, [~fhueske] do you think it's valuable? if yes, I can try to implement this function in {{DataSetUtils}}. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789267#comment-16789267 ] Fabian Hueske commented on FLINK-11818: --- Hi, I would not add it to the core API, i.e., the DataSet class. We try to keep this API rather lean. It might be a fit for DataSetUtils. > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787763#comment-16787763 ] vinoyang commented on FLINK-11818: -- Hi [~hequn8128] , In fact, my idea is not much different from the current implementation of Spark. 1) We can provide multiple overloaded methods called pipe for the DataSet object. E.g, p{{ipe(String cmd)/pipe(String cmd, Map env)...}}, Flink inputs the external program and gets the output of the external program as a new DataSet. [1] [2] 2) I think its semantics are similar to Spark. [1]: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala] [2]: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala] What do you think? cc [~fhueske] [~till.rohrmann] > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785163#comment-16785163 ] Hequn Cheng commented on FLINK-11818: - [~yanghua] It's an interesting feature! Do you have any ideas to implement the feature? For example: - What the API would be like in Flink? - What's the semantics of the API and what's the differences with the pipe method of Spark? > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
[ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784151#comment-16784151 ] vinoyang commented on FLINK-11818: -- [~fhueske] and [~till.rohrmann] what do you think about this issue? > Provide pipe transformation function for DataSet API > > > Key: FLINK-11818 > URL: https://issues.apache.org/jira/browse/FLINK-11818 > Project: Flink > Issue Type: Improvement > Components: API / DataSet >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > We have some business requirements that require the data handled by Flink to > interact with some external programs (such as Python/Perl/shell scripts). > There is no such function in the existing DataSet API, although it can be > implemented by the map function, but it is not concise. It would be helpful > if we could provide a pipe[1] function like Spark. > [1]: > https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005)