Hi Hemali, AFAIK you are correct - all elements with the same key will be processed by the same instance of the stateful DoFn (same machine, same thread). However that holds for PCollection which have a window applied - all elements with the same key+window combination will be processed by the same DoFn instance. Keep in mind that this inherently limits the runner ability to parallelize the stateful DoFn, which might cause a processing bottleneck, depending on the cardinality of the keys.
Regards, Amit. On Wed, Mar 31, 2021 at 8:33 PM Hemali Sutaria < [email protected]> wrote: > My understanding is : Stateful transformations are thread safe. In case of > global window + stateful transformation, Beam makes sure that all values > for that key must be processed on the same machine, in fact on the same > thread. Only if you have a session/time window, you need to add groupbykey. > Is it correct ? > > > > Thanks, > Hemali Sutaria > > > > On Wed, Mar 31, 2021 at 10:23 AM Kenneth Knowles <[email protected]> wrote: > >> >> On Wed, Mar 31, 2021 at 10:20 AM Kenneth Knowles <[email protected]> wrote: >> >>> >>> On Wed, Mar 31, 2021 at 10:19 AM Hemali Sutaria < >>> [email protected]> wrote: >>> >>>> I have a global window with per-key-and-window stateful processing >>>> dataflow job. Do I need groupbykey in my transform ? Thank you >>>> >>> >> No you do not need a GroupByKey. When you use a stateful DoFn the Beam >> runner will partition the data automatically by key and window. >> >> Kenn >> >> >>> >>>> >>>> https://cloud.google.com/blog/products/gcp/writing-dataflow-pipelines-with-scalability-in-mind >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.google.com_blog_products_gcp_writing-2Ddataflow-2Dpipelines-2Dwith-2Dscalability-2Din-2Dmind&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=gizAAGdFA7m5QsnxkMFRenvNE9IDJSHidbXk-LafTj8&m=w8YUTt_WFJLbjNZD-kVKZ5SvaTkaDMWomSaVYqm_1Bk&s=vKBpzxOdHAwbfZJK4hXknCqtzRPuAH0g-v5s3RrZUDE&e=> >>>> >>>> https://beam.apache.org/documentation/programming-guide/#transforms >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__beam.apache.org_documentation_programming-2Dguide_-23transforms&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=gizAAGdFA7m5QsnxkMFRenvNE9IDJSHidbXk-LafTj8&m=w8YUTt_WFJLbjNZD-kVKZ5SvaTkaDMWomSaVYqm_1Bk&s=5RU3xh0brlUPoAlIgo7VmJxM1QtXTrvsyH6_V_e6Sio&e=> >>>> >>>> >>>> https://beam.apache.org/blog/timely-processing/ >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__beam.apache.org_blog_timely-2Dprocessing_&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=gizAAGdFA7m5QsnxkMFRenvNE9IDJSHidbXk-LafTj8&m=w8YUTt_WFJLbjNZD-kVKZ5SvaTkaDMWomSaVYqm_1Bk&s=U_6l4v1fTsQ1tdjeUsLsksFDnqSMqV-p3OJNr9RgWkU&e=> >>>> >>>> >>>> Thanks, >>>> Hemali Sutaria >>>> >>>>
