I suspect this is due to Fusion in the steps between the FanOut and the DoFn.
Don't have the link to hand but look to the docs for 'breaking fusion' . Essentially via a Reshuffle or a sideinput. Cheers Reza On Tue, 31 Dec 2019, 14:26 André Rocha Silva, < [email protected]> wrote: > Hi! > > I have a cloud dataflow job that is not scaling. > > The job sequence is the following: > 1 - [io] Read from a file in the bucket (1 element out) > 2 - [ParDo] With the file information, get a query from a database (10,000 > elements out) > 3 - [ParDo] Works with the elements > > But when I read from a file that already contains the same database query > result it scales to 60+ workers: > 1 - [io] Read from a file in the bucket (10,000 elements out) > 2 - [ParDo] Works with the elements > > Do I have to develop an I/O connector for the apache beam to know how many > elements its dealing with? > > Best regards > André Rocha Silva > > >
