Re: Dataflow v2 runner scaling behaviour

David Sánchez Wed, 05 May 2021 03:52:37 -0700

Hi Pablo,

Did you find out anything? Any suggestion we can try?


Many thanks,
David

On Wed, Mar 24, 2021 at 5:19 PM David Sánchez <[email protected]> wrote:

> Hi Pablo,
>
> This is the input data we are testing
>
> Elements added38,792,932
> Estimated size3.14 GB
>
> On Wed, Mar 24, 2021 at 5:09 PM Pablo Estrada <[email protected]> wrote:
>
>> Hi David,
>> Thanks for sharing. I'm investigating something like this recently.
>> What's the size of your data?
>> Best
>> -P.
>>
>> On Wed, Mar 24, 2021, 7:52 AM David Sánchez <[email protected]> wrote:
>>
>>> Hi folks!
>>>
>>> I'm testing the dataflow v2 runner in a batch pipeline (Apache Beam
>>> Python 3.7 SDK 2.27.0) that reads many million of rows from BigQuery and
>>> writes to PubSub and BigQuery using the flag "--experiments=use_runner_v2".
>>>
>>> The same job used to scale up immediately to over 50 workers, but in v2
>>> it never scales up further than 5-6 workers, thus it's way slower. I can
>>> see however that the total vCPU and memory are about half than before,
>>> which is promising. Any clue about why the scaling is behaving differently?
>>>
>>> Many thanks
>>>
>>

Re: Dataflow v2 runner scaling behaviour

Reply via email to