If you want tight control on parallelism, you can 'reshuffle' the elements into fixed number of shards. See https://stackoverflow.com/questions/46116443/dataflow-streaming-job-not-scaleing-past-1-worker
On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal < harshvardhan.ag...@gmail.com> wrote: > Hi Guys, > > I am currently in the process of developing a pipeline using Apache Beam > with Flink as an execution engine. As a part of the process I read data > from Kafka and perform a bunch of transformations that involve joins, > aggregations as well as lookups to an external DB. > > The idea is that we want to have higher parallelism with Flink when we are > performing the aggregations but eventually coalesce the data and have > lesser number of processes writing to the DB so that the target DB can > handle it (for example say I want to have a parallelism of 40 for > aggregations but only 10 when writing to target DB). > > Is there any way we could do that in Beam? > > Regards, > > Harsh > -- > > *Regards,Harshvardhan Agrawal* > *267.991.6618 | LinkedIn <https://www.linkedin.com/in/harshvardhanagr/>* >