[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2021-04-27 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333991#comment-17333991
 ] 

Flink Jira Bot commented on FLINK-11818:


This issue was marked "stale-assigned" and has not received an update in 7 
days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2021-04-16 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323344#comment-17323344
 ] 

Flink Jira Bot commented on FLINK-11818:


This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-14 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792455#comment-16792455
 ] 

vinoyang commented on FLINK-11818:
--

Thanks for your [~fhueske] . Sounds good to me. I will start this feature and 
add the API to DataSetUtils. About the performance, I think we should not worry 
about it too much. Users should know this API would slow down the performance.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-14 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792451#comment-16792451
 ] 

Fabian Hueske commented on FLINK-11818:
---

I can see that such a function is valuable. However, I also think that starting 
external processes is performance sensitive and can also depend on the 
scheduling / availability of software. Hence, I would not make it a first-class 
API (i.e., add it to DataSetUtils).

When the feature is stable, we can check if the function is popular enough to 
move it to DataSet.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-11 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789279#comment-16789279
 ] 

vinoyang commented on FLINK-11818:
--

It seems it's reasonable. It is also not a normal transformation function 
provided by traditional RDBMS. So, [~fhueske] do you think it's valuable? if 
yes, I can try to implement this function in {{DataSetUtils}}.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-11 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789267#comment-16789267
 ] 

Fabian Hueske commented on FLINK-11818:
---

Hi,

I would not add it to the core API, i.e., the DataSet class. 
We try to keep this API rather lean.
It might be a fit for DataSetUtils.

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-08 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787763#comment-16787763
 ] 

vinoyang commented on FLINK-11818:
--

Hi [~hequn8128] , In fact, my idea is not much different from the current 
implementation of Spark.

1) We can provide multiple overloaded methods called pipe for the DataSet 
object. E.g, p{{ipe(String cmd)/pipe(String cmd, Map env)...}}, 
 Flink inputs the external program and gets the output of the external program 
as a new DataSet. [1]  [2]

2) I think its semantics are similar to Spark.

 

[1]: 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala]

[2]: 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala]

 

What do you think? cc [~fhueske] [~till.rohrmann]

 

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-05 Thread Hequn Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785163#comment-16785163
 ] 

Hequn Cheng commented on FLINK-11818:
-

[~yanghua] It's an interesting feature! Do you have any ideas to implement the 
feature? For example:
- What the API would be like in Flink?
- What's the semantics of the API and what's the differences with the pipe 
method of Spark?


> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API

2019-03-04 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784151#comment-16784151
 ] 

vinoyang commented on FLINK-11818:
--

[~fhueske] and [~till.rohrmann] what do you think about this issue? 

> Provide pipe transformation function for DataSet API
> 
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataSet
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We have some business requirements that require the data handled by Flink to 
> interact with some external programs (such as Python/Perl/shell scripts). 
> There is no such function in the existing DataSet API, although it can be 
> implemented by the map function, but it is not concise. It would be helpful 
> if we could provide a pipe[1] function like Spark.
> [1]: 
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)