Hi there,
I'm feeding a Flink stream with events from a Kinesis stream and I'm
looking for some guidance on how to enable event time in the Flink stream.
I've read through the documentation and it seems like I want to add
events that carry watermark information to the Kinesis stream and
subsequently use AssignerWithPunctuatedWatermarks to read and extract
the watermark information to the Flink stream. However, as a Kinesis
stream is composed from potentially multiple shards, which are similar
to Kafka partitions, using a single event to determine the watermark off
the Flink stream may affect the semantics of the system:
Kinesis guarantees the order within a single shard but not across the
entire stream. So if a single watermark event is added to the stream, it
ends up in a particular shard and this shard may be processed faster
that others. Accordingly, when the event is read and used to determine
the watermark in the Flink stream, there may be still unprocessed events
in other shards with an event time that is lower than that of the
already processed watermark event.
Therefore, it seems like I should submit a watermark event to every
shard, keep track of the last watermark event for each shard, and use
the minimum time of those watermark events to determine the watermark
for the Flink stream.
Am I thinking too complicated here? Any guidance on how to implement
this correctly is highly appreciated.
Thanks,
Steffen
- Submitting watermarks through a Kinesis stream Steffen Hausmann
-