Also to clarify here (I re-read this and realized it could be slightly
unclear).  My question is only about in-order delivery of panes.  ie: will
pane P always be delivered before P+1.

I realize the use of "in-order" before could be confusing, I don't care
about the ordering of the elements per-se, just the ordering of the pane
delivery.

I want to make sure that given a GBK that produces 3 panes (P0, P1, P2) for
a key, a downstream PCollection could never see P0, P2, P1.  OR at least,
the final firing is always guaranteed to be delivered after all
early-firings (eg we could have P0, P2, P1, but then always PLast).

On Tue, Feb 12, 2019 at 11:48 AM Steve Niemitz <sniem...@apache.org> wrote:

> Are you also saying also that even in the first example (Source ->
> CombineByKey (Sum) -> Sink) there's no guarantee that events would be
> delivered in-order from the Combine -> Sink transforms?  This seems like a
> pretty big "got-cha" for correctness if you ever use accumulating
> triggering.
>
> I'd also like to point out I'm not talking about a global ordering across
> the entire PCollection, I'm talking about within the same key after a GBK
> transform.
>
> On Tue, Feb 12, 2019 at 11:35 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> Due to the nature of distributed processing, order is not preserved. You
>> can, however, inspect the PaneInfo to determine if an element was early,
>> on-time, or late and act accordingly.
>>
>> On Tue, Feb 12, 2019 at 5:15 PM Juan Carlos Garcia <jcgarc...@gmail.com>
>> wrote:
>>
>>> In my experience ordering is not guaranteed, you may need apply a
>>> transformation that sort the elements and then dispatch them sorted out.
>>>
>>> Or uses the Sorter extension for this:
>>>
>>> https://github.com/apache/beam/tree/master/sdks/java/extensions/sorter
>>>
>>> Steve Niemitz <sniem...@apache.org> schrieb am Di., 12. Feb. 2019,
>>> 16:31:
>>>
>>>> Hi everyone, I have some questions I want to ask about how windowing,
>>>> triggering, and panes work together, and how to ensure correctness
>>>> throughout a pipeline.
>>>>
>>>> Lets assume I have a very simple streaming pipeline that looks like:
>>>> Source -> CombineByKey (Sum) -> Sink
>>>>
>>>> Given fixed windows of 1 hour, early firings every minute, and
>>>> accumulating panes, this is pretty straight forward.  However, this can get
>>>> more complicated if we add steps after the CombineByKey, for instance
>>>> (using the same windowing strategy):
>>>>
>>>> Say I want to buffer the results of the CombineByKey into batches of N
>>>> elements.  I can do this with the built-in GroupIntoBatches [1] transform,
>>>> now my pipeline looks like:
>>>>
>>>> Source -> CombineByKey (Sum) -> GroupIntoBatches -> Sink
>>>>
>>>> *This leads to my main question:*
>>>> Is ordering preserved somehow here?  ie: is it possible that the result
>>>> from early firing F+1 now comes BEFORE the firing F (because it was
>>>> re-ordered in the GroupIntoBatches).  This would mean that the sink then
>>>> gets F+1 before F, which means my resulting store has incorrect data
>>>> (possibly forever if F+1 was the final firing).
>>>>
>>>> If ordering is not preserved, it seems as if I'd need to introduce my
>>>> own ordering back in after GroupIntoBatches.  GIB is an example here, but I
>>>> imagine this could happen with any GBK type operation.
>>>>
>>>> Am I thinking about this the correct way?  Thanks!
>>>>
>>>> [1]
>>>> https://beam.apache.org/releases/javadoc/2.10.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html
>>>>
>>>

Reply via email to