Re: [PROPOSAL] Structure the Flink Open Source Development

2016-06-08 Thread Aljoscha Krettek
I think it would make sense to also move "State Backends" out from "Runtime". This is also quite complex on it's own. I would of course volunteer for this and I think Stephan, who is the current proposal for "Runtime" would also be good. On Wed, 8 Jun 2016 at 19:22 Stephan Ewen

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Alexander Alexandrov
> As far as I know, the reason why the broadcast variables are implemented that way is that the senders would have to know which sub-tasks are deployed to which TMs. As the broadcast variables are realized as additionally attached "broadcast channels", I am assuming that the same behavior will

[jira] [Created] (FLINK-4034) Dependency convergence on com.101tec:zkclient and com.esotericsoftware.kryo:kryo

2016-06-08 Thread Vladislav Pernin (JIRA)
Vladislav Pernin created FLINK-4034: --- Summary: Dependency convergence on com.101tec:zkclient and com.esotericsoftware.kryo:kryo Key: FLINK-4034 URL: https://issues.apache.org/jira/browse/FLINK-4034

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-06-08 Thread Stephan Ewen
I am adding a dedicated component for "Checkpointing". It would include the checkpoint coordinator, barriers, threads, state handles and recovery. I think that part is big and complex enough to warrant its own shepherd. I would volunteer for that and be happy to also have a second shepherd. On

[jira] [Created] (FLINK-4033) Missing Scala example snippets for the Kinesis Connector documentation

2016-06-08 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-4033: -- Summary: Missing Scala example snippets for the Kinesis Connector documentation Key: FLINK-4033 URL: https://issues.apache.org/jira/browse/FLINK-4033

AW: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
Hi Till, thanks for the fast answer. I'll think about a concrete way of implementing and open an JIRA. Best Andreas Von: Till Rohrmann Gesendet: Mittwoch, 8. Juni 2016 15:53 An: dev@flink.apache.org Betreff: Re: Broadcast data

[jira] [Created] (FLINK-4032) Replace all usage of Guava Preconditions

2016-06-08 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-4032: --- Summary: Replace all usage of Guava Preconditions Key: FLINK-4032 URL: https://issues.apache.org/jira/browse/FLINK-4032 Project: Flink Issue Type:

Re: Future of Python support

2016-06-08 Thread Chesnay Schepler
Hello Julius, I don't think there is any real roadmap for the Python API, regardless of batch or streaming. Of the top of my head i can think of the following issue: The batch Python API makes heavy use of MapPartitions to transfer data in batches, I'm not sure how well this could be done

Future of Python support

2016-06-08 Thread Julius Neuffer
Hi, I am interested in using Flink as part of a research project. We normally use python as a programming language. The python support for the Batch API is already quite good. But I couldn't find any information on the future roadmap regarding python support in Flink. Are there plans to add

Re: Broadcast data sent increases with # slots per TM

2016-06-08 Thread Till Rohrmann
Hi Andreas, your observation is correct. The data is sent to each slot and the receiving TM only materializes one copy of the data. The rest of the data is discarded. As far as I know, the reason why the broadcast variables are implemented that way is that the senders would have to know which

[jira] [Created] (FLINK-4030) ScalaShellITCase

2016-06-08 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-4030: - Summary: ScalaShellITCase Key: FLINK-4030 URL: https://issues.apache.org/jira/browse/FLINK-4030 Project: Flink Issue Type: Bug

Broadcast data sent increases with # slots per TM

2016-06-08 Thread Kunft, Andreas
Hi, we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager. We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results

[jira] [Created] (FLINK-4029) Multi-field "sum" function just like "keyBy"

2016-06-08 Thread Rami (JIRA)
Rami created FLINK-4029: --- Summary: Multi-field "sum" function just like "keyBy" Key: FLINK-4029 URL: https://issues.apache.org/jira/browse/FLINK-4029 Project: Flink Issue Type: Improvement

Re: DataStream split/select behaviour

2016-06-08 Thread Till Rohrmann
Hi, the directed output via the split and select methods are indeed only available in the DataStream API. Thus, in order to achieve the same with the DataSet API, you would have to apply multiple filters, as you've already written. The result of the select call will only be sent to the same task