Re: Understanding GenerateSequence and SideInputs

Lukasz Cwik Thu, 24 May 2018 14:23:00 -0700

I should have clarified that the precision guarantee I was talking about
was timing.


On Thu, May 24, 2018 at 2:21 PM Lukasz Cwik <lc...@google.com> wrote:

> The runner is responsible for scheduling the work anywhere it chooses. It
> can be the same node all the time or different nodes.
>
> There is no precision guarantee on the upper bound (only the lower bound),
> the withRate method states that it will "generate at most a given number
> of elements per a given period". This is because a DoFn can't control
> whether and when the runner decides to schedule the work. A runner will
> attempt to honor any processing commitments that it knows about such as
> timers but if the runner has too much work and too few resources it may
> fall behind or decide to group small work units into larger work units for
> performance reasons.
>
>
>
> On Thu, May 24, 2018 at 1:11 PM Carlos Alonso <car...@mrcalonso.com>
> wrote:
>
>> Hi everyone!!
>>
>> I'm building a pipeline to store streaming data into BQ and I'm using the
>> pattern: Slowly changing lookup cache described here:
>> https://cloud.google.com/blog/big-data/2017/06/guide-to-common-cloud-dataflow-use-case-patterns-part-1
>>  to
>> hold and refresh the table schemas (as they may change from time to time).
>>
>> Now I'd like to understand how that is scheduled on a distributed system.
>> Who is running that code? One random node? One node but always the same?
>> All nodes?
>>
>> Also, what are the GenerateSequence guarantees in terms of precision? I
>> have it configured to generate 1 element every 5 minutes and most of the
>> time it works exact, but sometimes it doesn't... Is that expected?
>>
>> Regards
>>
>

Re: Understanding GenerateSequence and SideInputs

Reply via email to