Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Maulik Gandhi Tue, 19 Mar 2019 13:06:57 -0700

Hi Juan,

Thanks for replying.  I believe I am using correct configurations.


I have posted more details with code snippet and Data Flow job template
configuration on Stack Overflow post:
https://stackoverflow.com/q/55242684/11226631

Thanks.
- Maulik

On Tue, Mar 19, 2019 at 2:53 PM Juan Carlos Garcia <[email protected]>
wrote:

> Hi Maulik,
>
> Have you submitted your job with the correct configuration to enable
> autoscaling?
>
> --autoscalingAlgorithm=
> --maxWorkers=
>
> I am on my phone right now and can't tell if the flags name are 100%
> correct.
>
>
> Maulik Gandhi <[email protected]> schrieb am Di., 19. März 2019, 18:13:
>
>>
>> Maulik Gandhi <[email protected]>
>> 10:19 AM (1 hour ago)
>> to user
>> Hi Beam Community,
>>
>> I am working on Beam processing pipeline, which reads data from the
>> non-bounded and bounded source and want to leverage Beam state management
>> in my pipeline.  For putting data in Beam state, I have to transfer the
>> data in key-value (eg: KV<String, Object>.  As I am reading data from the
>> non-bounded and bounded source, I am forced to perform Window + Triggering,
>> before grouping data by key.  I have chosen to use GlobalWindows().
>>
>> I am able to kick-off the Data Flow job, which would run my Beam
>> pipeline.  I have noticed Data Flow would use only 1 Worker node to perform
>> the work, and would not scale the job to use more worker nodes, thus not
>> leveraging the benefit of distributed processing.
>>
>> I have posted the question on Stack Overflow:
>> https://stackoverflow.com/questions/55242684/join-bounded-and-non-bounded-source-data-flow-job-not-scaling
>>  but
>> reaching out on the mailing list, to get some help, or learn what I
>> am missing.
>>
>> Any help would be appreciated.
>>
>> Thanks.
>> - Maulik
>>
>

Re: Scaling Beam pipeline on Data Flow - Join bounded and non-bounded source

Reply via email to