Thanks for the clarifications Robert. On Tue, Nov 16, 2021 at 3:27 PM Robert Bradshaw <[email protected]> wrote:
> On Tue, Nov 16, 2021 at 3:00 PM gaurav mishra > <[email protected]> wrote: > > > > Hi, > > I have a pipeline which looks like this - > > Input -> Convert_to_KV_pairsDoFn -> SomeStatefulDofn -> output > > As you can see there is no explicit "shuffle" transform here > > > > My understanding and observation so far has been that the > SomeStatefulDofn will never be executed in parallel on two workers for any > given key. Is my understanding correct? > > That is correct. > > > If yes, then second question - is there an implicit groupByKey kind of > step introduced by dataflow here to ensure that all msges with same key > goes to same worker? > > Exactly. (It doesn't technically group things, in the sense that it > doesn't wait for all values per key-window to be available as a > barrier, but it shuffles all keys to the same worker same as a GBK.) >
