Yes, you should use BatchElements. Stateful DoFns are not yet supported for Python Dataflow. (The difference is that GroupIntoBatches has the capability to batch across bundles, which can be important for streaming.)
On Wed, Feb 5, 2020 at 7:53 AM Alan Krumholz <[email protected]> wrote: > > OK, seems like beam.BatchElements(max_batch_size=x) will do the trick for me > and runs fine in DataFlow! > > On Wed, Feb 5, 2020 at 7:38 AM Alan Krumholz <[email protected]> > wrote: >> >> Actually beam.GroupIntoBatches() gives me the same error as >> beam.util.GroupIntoBatches() :( >> back to square one. >> >> Any other ideas? >> >> Thank you! >> >> >> On Wed, Feb 5, 2020 at 7:32 AM Alan Krumholz <[email protected]> >> wrote: >>> >>> Never mind there seems to be a beam.GroupIntoBatches() that I should have >>> originally used instead of beam.util.GroupIntoBatches().... >>> >>> On Wed, Feb 5, 2020 at 7:19 AM Alan Krumholz <[email protected]> >>> wrote: >>>> >>>> Hello, I'm having issues running beam.util.GroupIntoBatches() in DataFlow. >>>> >>>> I get the following error message: >>>> >>>>> Exception: Requested execution of a stateful DoFn, but no user state >>>>> context is available. This likely means that the current runner does not >>>>> support the execution of stateful DoFns >>>> >>>> >>>> Seems to be related to: >>>> https://stackoverflow.com/questions/56403572/no-userstate-context-is-available-google-cloud-dataflow >>>> >>>> Is there another way I can achieve the same using other beam function? >>>> >>>> I basically want to batch rows into groups of 100 as it is a lot faster to >>>> transform all at once than doing it 1 by 1. >>>> >>>> I also was planning to use this function for a custom snowflake sink (so I >>>> could insert many rows at once) >>>> >>>> I'm sure there must be another way to do this in DataFlow but not sure how? >>>> >>>> Thanks so much!
