I suspect this is due to Fusion in the steps between the FanOut and the
DoFn.

Don't have the link to hand but look to the docs for 'breaking fusion' .
Essentially via a Reshuffle or a sideinput.

Cheers

Reza

On Tue, 31 Dec 2019, 14:26 André Rocha Silva, <
[email protected]> wrote:

> Hi!
>
> I have a cloud dataflow job that is not scaling.
>
> The job sequence is the following:
> 1 -  [io] Read from a file in the bucket (1 element out)
> 2 - [ParDo] With the file information, get a query from a database (10,000
> elements out)
> 3 - [ParDo] Works with the elements
>
> But when I read from a file that already contains the same database query
> result it scales to 60+ workers:
> 1 -  [io] Read from a file in the bucket (10,000 elements out)
> 2 - [ParDo] Works with the elements
>
> Do I have to develop an I/O connector for the apache beam to know how many
> elements its dealing with?
>
> Best regards
> André Rocha Silva
>
>
>

Reply via email to