Hi Sergii,

Window operator won't affect/adjust the output watermark, it just
propagated as-is which is said in the document.
I think the mistake here is you are using the wrong event-time of the
window, actually, you should use TUMBLE_ROWTIME(...) as event_time [1].
The event-time of the window should be the maximal timestamp of the window,
e.g. a window of [10:00, 11:00), the event-time of this window should be
10:59.999,
not the start time. Because it indicates when this event happens, a window
happens when the window is closed (the max timestamp).
That's how TUMBLE_ROWTIME calculated in Flink SQL.

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/queries.html#selecting-group-window-start-and-end-timestamps


On Mon, 22 Jun 2020 at 04:37, Sergii Mikhtoniuk <mikhton...@gmail.com>
wrote:

> Greetings,
>
> When playing around with the following simple event-time stream
> aggregation:
>
>       SELECT
>         TUMBLE_START(event_time, INTERVAL '1' DAY) as event_time,
>         ...
>       FROM input
>       GROUP BY TUMBLE(event_time, INTERVAL '1' DAY), symbol
>
> ...to my surprise I found out that the tumbling window operator has no
> effect on the watermarks of the resulting append stream - the watermarks of
> the input stream are propagated as-is.
>
> This seems to be a documented behavior
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#interaction-of-watermarks-and-windows
>  but
> it's still very counter-intuitive to me and I couldn't find any explanation
> of it.
>
> My understanding of the watermarking is that MOST data is expected to
> arrive with event time below the stream's watermark. Late events are either
> discarded or should be handled as exceptional cases, e.g. via "allowed
> lateness".
>
> So in my aggregation above I was expecting the result watermark to be
> offset by ~1 day from the input and be emitted only after a tumbling window
> closes. Instead, with input watermarks propagated as-is ALL events in the
> resulting stream end up being late in relation to the current watermark...
> Doesn't this behavior ruin the composition, as downstream operators will be
> discarding all late data?
>
> I'd greatly appreciate if someone could shed light on this design decision.
>
> Thanks,
> Sergii
>

Reply via email to