[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-05-03 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-98457842 OK, I moved it to contrib.streaming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-05-03 Thread mbalassi
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-98522518 Looks good, merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-05-02 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-98402525 Thanks! I added the integration test. @gyfora, I can't decide where should this be placed. Originally, collect() was a method of DataStream, and then putting it in

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-30 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-97956542 I think this looks good now ! I think this needs a test (integration test), otherwise it probably gets broken by some change soon. Starting a

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-28 Thread gyfora
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-97051927 I tested it from the IDE and submitting remotely to the local cluster, it seems to work properly. Later today I will try running it on EC2. It is actually a very neat

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-28 Thread gyfora
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-97125728 I tried it on EC2, and it worked properly when I submitted the job from the command line. I cannot submit to EC2 using the remote environment but thats probably a firewall

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-26 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-96373660 I did the small change suggested by Gyula. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-25 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-96220876 I made the modification to use NetUtils.findConnectingAddress. I tested it in both local and remote environments. I also tested the scenario that you mentioned where I

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-23 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-95523580 OK, I see your point. I am thinking about using NetUtils.findConnectingAddress to determine which interface is used for the communication with the cluster. For this, I

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-23 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-95528015 `NetUtils.findConnectingAddress` is a useful util, when you know that the endpoint is up already. If you can make the assumption that the master is running already

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-21 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-94911176 This looks much better. Being not densely integrated into the DataStream makes it easier to maintain. The `InetAddress.getLocalHost().getHostAddress()`

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-21 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-94904085 I updated the pull request as per the above points. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-19 Thread ggevay
Github user ggevay commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-94310035 Thank you for your comments. I am very sorry for not replying earlier, but I was extremely busy this week with other things. I will try to address the issues that you

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-17 Thread mbalassi
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-93988704 @ggevay this PR has been a bit quiet lately, what do you think about the comments? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-17 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-93991530 What about reworking this as a library function and add it to flink-contrib? Make this a special streaming sink and a local util that is used in the program that

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-08 Thread mbalassi
Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/581#discussion_r28028612 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/datastream/DataStream.java --- @@ -1165,6 +1168,7

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-08 Thread mbalassi
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-91082306 Thanks for picking up the issue, @ggevay. I would like to add to the list Stephan mentioned: I personally prefer the name collect for the method, it can still

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-08 Thread ggevay
GitHub user ggevay opened a pull request: https://github.com/apache/flink/pull/581 [FLINK-1670] Made DataStream iterable I created a DataStreamIterator class, and added an iterator() method to DataStream, which returns an instance of it. The iterator creates a TCP server on which

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-08 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/581#issuecomment-90987769 It is an interesting idea to collect back a data stream. This solution here has, however, quite a few limitations and implications (I assume it was only locally