A while back I wrote this slightly more elaborate extractor that will advance
the watermark independently after the stream is idle for a while:
This depends on the requirements of your application.
Using the usual watermark generation strategies which are purely data
driven, a stream that does not produce data would not advance its
watermarks.
Not advancing the watermarks means that the program cannot make progress.
This might also be
Hi Fabian,
I want to extract timestamps from my event. However, the events stream can
be sparse at times (e.g. 2 days without any events).
What's the best strategy to create watermarks if I want real-time
processing of the events which enter the stream?
Jayant Ameta
On Thu, Jan 11, 2018 at 4:53
Another thing to point out is that watermarks are usually data-driven,
i.e., they depend on the timestamps of the events and not on the clock of
the machine.
Otherwise, you might observe a lot of late data, i.e., events with
timestamps smaller than the last watermark.
If you assign timestamps and
Hi Jayant,
The difference is that the Watermarks from
BoundedOutOfOrdernessTimestampExtractor are based on the greatest timestamp
of
all previous events. That is, if you do not receive new events, the
Watermark
will not advance. In contrast, your custom implementation of