In my experience ordering is not guaranteed, you may need apply a transformation that sort the elements and then dispatch them sorted out.
Or uses the Sorter extension for this: https://github.com/apache/beam/tree/master/sdks/java/extensions/sorter Steve Niemitz <[email protected]> schrieb am Di., 12. Feb. 2019, 16:31: > Hi everyone, I have some questions I want to ask about how windowing, > triggering, and panes work together, and how to ensure correctness > throughout a pipeline. > > Lets assume I have a very simple streaming pipeline that looks like: > Source -> CombineByKey (Sum) -> Sink > > Given fixed windows of 1 hour, early firings every minute, and > accumulating panes, this is pretty straight forward. However, this can get > more complicated if we add steps after the CombineByKey, for instance > (using the same windowing strategy): > > Say I want to buffer the results of the CombineByKey into batches of N > elements. I can do this with the built-in GroupIntoBatches [1] transform, > now my pipeline looks like: > > Source -> CombineByKey (Sum) -> GroupIntoBatches -> Sink > > *This leads to my main question:* > Is ordering preserved somehow here? ie: is it possible that the result > from early firing F+1 now comes BEFORE the firing F (because it was > re-ordered in the GroupIntoBatches). This would mean that the sink then > gets F+1 before F, which means my resulting store has incorrect data > (possibly forever if F+1 was the final firing). > > If ordering is not preserved, it seems as if I'd need to introduce my own > ordering back in after GroupIntoBatches. GIB is an example here, but I > imagine this could happen with any GBK type operation. > > Am I thinking about this the correct way? Thanks! > > [1] > https://beam.apache.org/releases/javadoc/2.10.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html >
