If I understood correctly, different partitions of Kafka would be emitted by 
different source tasks with different watermark progress.  And the Flink 
framework would align the different watermarks to only output the smallest 
watermark among them, so the events from slow partitions would not be discarded 
because the downstream operator would only see the watermark based on the slow 
partition atm. You can refer to [1] for some details.

As for rewinding the offset of partition position, I guess it only happens in 
failure recovery case or you manually restart the job. Anyway all the topology 
tasks would be restarted and previous received watermarks are cleared.
So it would also not discard the events in this case.  Unless you can only 
rewind some source task to previous positions and keep other downstream tasks 
still running, it might have the issues you concern. But Flink can not support 
such operation/function atm. :) 

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/event_timestamps_watermarks.html

Best,
Zhijiang
------------------------------------------------------------------
From:邢瑞斌 <xingro...@gmail.com>
Send Time:2019 Dec. 25 (Wed.) 20:27
To:user-zh <user...@flink.apache.org>; user <user@flink.apache.org>
Subject:Rewind offset to a previous position and ensure certainty.

Hi,

I'm trying to use Kafka as an event store and I want to create several 
partitions to improve read/write throughput. Occasionally I need to rewind 
offset to a previous position for recomputing. Since order isn't guaranteed 
among partitions in Kafka, does this mean that Flink won't produce the same 
results as before when rewind even if it uses event time? For example, consumer 
for a partition progresses extremely fast and raises watermark, so events from 
other partitions are discarded. Is there any ways to prevent this from 
happening?

Thanks in advance!

Ruibin

Reply via email to