Re: Controlling parallelism of a ParDo Transform while writing to DB

Chamikara Jayalath Wed, 16 May 2018 17:46:06 -0700

I don't think this can be specified through Beam API but Flink runner might
have additional configurations that I'm not aware of. Also, many runners
fuse steps to improve the execution performance. So simply specifying the
parallelism of a single step will not work.


Thanks,
Cham

On Tue, May 15, 2018 at 11:21 AM Harshvardhan Agrawal <
[email protected]> wrote:

> Hi Guys,
>
> I am currently in the process of developing a pipeline using Apache Beam
> with Flink as an execution engine. As a part of the process I read data
> from Kafka and perform a bunch of transformations that involve joins,
> aggregations as well as lookups to an external DB.
>
> The idea is that we want to have higher parallelism with Flink when we are
> performing the aggregations but eventually coalesce the data and have
> lesser number of processes writing to the DB so that the target DB can
> handle it (for example say I want to have a parallelism of 40 for
> aggregations but only 10 when writing to target DB).
>
> Is there any way we could do that in Beam?
>
> Regards,
>
> Harsh
> --
>
> *Regards,Harshvardhan Agrawal*
> *267.991.6618 | LinkedIn <https://www.linkedin.com/in/harshvardhanagr/>*
>

Re: Controlling parallelism of a ParDo Transform while writing to DB

Reply via email to