I should have clarified that the precision guarantee I was talking about was timing.
On Thu, May 24, 2018 at 2:21 PM Lukasz Cwik <lc...@google.com> wrote: > The runner is responsible for scheduling the work anywhere it chooses. It > can be the same node all the time or different nodes. > > There is no precision guarantee on the upper bound (only the lower bound), > the withRate method states that it will "generate at most a given number > of elements per a given period". This is because a DoFn can't control > whether and when the runner decides to schedule the work. A runner will > attempt to honor any processing commitments that it knows about such as > timers but if the runner has too much work and too few resources it may > fall behind or decide to group small work units into larger work units for > performance reasons. > > > > On Thu, May 24, 2018 at 1:11 PM Carlos Alonso <car...@mrcalonso.com> > wrote: > >> Hi everyone!! >> >> I'm building a pipeline to store streaming data into BQ and I'm using the >> pattern: Slowly changing lookup cache described here: >> https://cloud.google.com/blog/big-data/2017/06/guide-to-common-cloud-dataflow-use-case-patterns-part-1 >> to >> hold and refresh the table schemas (as they may change from time to time). >> >> Now I'd like to understand how that is scheduled on a distributed system. >> Who is running that code? One random node? One node but always the same? >> All nodes? >> >> Also, what are the GenerateSequence guarantees in terms of precision? I >> have it configured to generate 1 element every 5 minutes and most of the >> time it works exact, but sometimes it doesn't... Is that expected? >> >> Regards >> >