Re: Controlling parallelism of a ParDo Transform while writing to DB

2018-05-16 Thread Raghu Angadi
If you want tight control on parallelism, you can 'reshuffle' the elements into fixed number of shards. See https://stackoverflow.com/questions/46116443/dataflow-streaming-job-not-scaleing-past-1-worker On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal < harshvardhan.ag...@gmail.com> wrote: >

Re: Normal Spark Streaming vs Streaming on Beam with Spark Runner

2018-05-16 Thread chandan prakash
Thanks Ismaël. Your answers were quite useful for a novice user. I guess this answer will help many like me. *Regarding your answer to point 2 :* *"Checkpointing is supported, Kafka offset management (if I understand whatyoumean) is managed by the KafkaIO connector + the runner"* Beam provides

Re: Controlling parallelism of a ParDo Transform while writing to DB

2018-05-16 Thread Chamikara Jayalath
Exact mechanism of controlling parallelism is runner specific. Looks like Flink allows users to to specify the amount of parallelism (per job) using following option. https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L65

Re: Controlling parallelism of a ParDo Transform while writing to DB

2018-05-16 Thread Harshvardhan Agrawal
How do we control parallelism of a particular step then? Is there a recommended approach to solve this problem? On Wed, May 16, 2018 at 20:45 Chamikara Jayalath wrote: > I don't think this can be specified through Beam API but Flink runner > might have additional configurations that I'm not awar

Re: Controlling parallelism of a ParDo Transform while writing to DB

2018-05-16 Thread Chamikara Jayalath
I don't think this can be specified through Beam API but Flink runner might have additional configurations that I'm not aware of. Also, many runners fuse steps to improve the execution performance. So simply specifying the parallelism of a single step will not work. Thanks, Cham On Tue, May 15, 2

Re: Create SideInputs for PTranform using lookups for each window of data

2018-05-16 Thread Lukasz Cwik
Makes sense. On Tue, May 15, 2018 at 7:22 PM Harshvardhan Agrawal < harshvardhan.ag...@gmail.com> wrote: > It’s more like I have a window and I create a side input for that window. > Once I am done with processing the window I want to discard that side input > and create a new ones for subsequent

Re: Normal Spark Streaming vs Streaming on Beam with Spark Runner

2018-05-16 Thread Ismaël Mejía
Hello, Answers to the questions inline: > 1. Are there any limitations in terms of implementations, functionalities or performance if we want to run streaming on Beam with Spark runner vs streaming on Spark-Streaming directly ? At this moment the Spark runner does not support some parts of the B