If you want tight control on parallelism, you can 'reshuffle' the elements
into fixed number of shards. See
https://stackoverflow.com/questions/46116443/dataflow-streaming-job-not-scaleing-past-1-worker

On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:

> Hi Guys,
>
> I am currently in the process of developing a pipeline using Apache Beam
> with Flink as an execution engine. As a part of the process I read data
> from Kafka and perform a bunch of transformations that involve joins,
> aggregations as well as lookups to an external DB.
>
> The idea is that we want to have higher parallelism with Flink when we are
> performing the aggregations but eventually coalesce the data and have
> lesser number of processes writing to the DB so that the target DB can
> handle it (for example say I want to have a parallelism of 40 for
> aggregations but only 10 when writing to target DB).
>
> Is there any way we could do that in Beam?
>
> Regards,
>
> Harsh
> --
>
> *Regards,Harshvardhan Agrawal*
> *267.991.6618 | LinkedIn <https://www.linkedin.com/in/harshvardhanagr/>*
>

Reply via email to