Thanks for the clarifications Robert.

On Tue, Nov 16, 2021 at 3:27 PM Robert Bradshaw <[email protected]> wrote:

> On Tue, Nov 16, 2021 at 3:00 PM gaurav mishra
> <[email protected]> wrote:
> >
> > Hi,
> > I have a pipeline which looks like this -
> > Input -> Convert_to_KV_pairsDoFn -> SomeStatefulDofn -> output
> > As you can see there is no explicit "shuffle" transform here
> >
> > My understanding and observation so far has been that the
> SomeStatefulDofn will never be executed in parallel on two workers for any
> given key. Is my understanding correct?
>
> That is correct.
>
> > If yes, then second question - is there an implicit groupByKey kind of
> step introduced by dataflow here to ensure that all msges with same key
> goes to same worker?
>
> Exactly. (It doesn't technically group things, in the sense that it
> doesn't wait for all values per key-window to be available as a
> barrier, but it shuffles all keys to the same worker same as a GBK.)
>

Reply via email to