I asked the user to check if it was just the GBK or the entire Reshuffle,
and they confirmed it was the entire Reshuffle. Also their pipeline did
ultimately not have everything that was expected to be output. I'm still
asking the user for more info to make sure this isn't a bug on the Dataflow
Reshuffle should be emitting exactly the same number of elements that it
gets. The GBK inside Reshuffle may have slightly less due to key
collisions, but the ExpandIterable step should take care of this. Do we
have counts for that output? (I will say that seem to be an
extraordinarily high number
Hi dev list,
While answering Stack Overflow questions I stumbled onto this:
https://stackoverflow.com/questions/62017572/beam-java-dataflow-bigquery-streaming-insert-groupbykey-reducing-elements
The user's pipeline seems to have a Reshuffle outputting less elements than
it received, inside a