Re: Parallel computation of windows in Flink

2019-06-10 Thread Mike Kaplinskiy
At a high level, the pipeline looks something like this: pipeline .apply("read kafka", KafkaIO.readBytes().updateConsumerProperties({"auto.offset.reset": "earliest"})) .apply("xform", MapElements.via(...)) .apply("window", Window.into(FixedWindows.of(Duration.standardDays(1)))

Re: Parallel computation of windows in Flink

2019-06-10 Thread Ankur Goenka
Hi Mike, This can be because of the partitioning logic of the data. If possible, can you share your pipeline code at a high level. On Mon, Jun 10, 2019 at 12:58 PM Mike Kaplinskiy wrote: > > Ladder . The smart, modern way to insure your life. > > > On Mon, Jun 10, 2019

Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-10 Thread Chad Dombrova
> > > @Chad Thanks for the feedback. I agree that we can improve our release > notes. The particular issue you were looking for was part of the detailed > list [1] linked in the blog post: > https://jira.apache.org/jira/browse/BEAM-7029 Just to be clear, I had no idea about the feature ahead of

Re: Python sdk performance

2019-06-10 Thread Maximilian Michels
Hi Mingliang, You can increase the parallelism of the Python SDK Harness via the pipeline option   --experimental worker_threads= Note that the workers are Python threads which suffer from the Global Interpreter Lock. We currently do not use real processes, e.g. via multiprocessing. There

Re: Parallel computation of windows in Flink

2019-06-10 Thread Maximilian Michels
Hi Mike, If you set the number of shards to 1, you should get one shard per window; unless you have "ignore windows" set to true. > (The way I'm checking this is via the Flink UI) I'm curious, how do you check this via the Flink UI? Cheers, Max On 09.06.19 22:33, Mike Kaplinskiy wrote: > Hi

Re: [ANNOUNCE] Apache Beam 2.13.0 released!

2019-06-10 Thread Maximilian Michels
Thanks for managing the release, Ankur! @Chad Thanks for the feedback. I agree that we can improve our release notes. The particular issue you were looking for was part of the detailed list [1] linked in the blog post: https://jira.apache.org/jira/browse/BEAM-7029 Cheers, Max [1]