Hi Pablo, Did you find out anything? Any suggestion we can try?
Many thanks, David On Wed, Mar 24, 2021 at 5:19 PM David Sánchez <[email protected]> wrote: > Hi Pablo, > > This is the input data we are testing > > Elements added38,792,932 > Estimated size3.14 GB > > On Wed, Mar 24, 2021 at 5:09 PM Pablo Estrada <[email protected]> wrote: > >> Hi David, >> Thanks for sharing. I'm investigating something like this recently. >> What's the size of your data? >> Best >> -P. >> >> On Wed, Mar 24, 2021, 7:52 AM David Sánchez <[email protected]> wrote: >> >>> Hi folks! >>> >>> I'm testing the dataflow v2 runner in a batch pipeline (Apache Beam >>> Python 3.7 SDK 2.27.0) that reads many million of rows from BigQuery and >>> writes to PubSub and BigQuery using the flag "--experiments=use_runner_v2". >>> >>> The same job used to scale up immediately to over 50 workers, but in v2 >>> it never scales up further than 5-6 workers, thus it's way slower. I can >>> see however that the total vCPU and memory are about half than before, >>> which is promising. Any clue about why the scaling is behaving differently? >>> >>> Many thanks >>> >>
