Hi,

For failures recovery with Kafka 0.9 it is not possible to avoid duplicated 
messages. Using Flink 1.4 (unreleased yet) combined with Kafka 0.11 it will be 
possible to achieve exactly-once end to end semantic when writing to Kafka. 
However this still a work in progress:

https://issues.apache.org/jira/browse/FLINK-6988 
<https://issues.apache.org/jira/browse/FLINK-6988>

However this is a superset of functionality that you are asking for. 
Exactly-once just for clean shutdowns is also on our “TODO” list (it 
would/could support Kafka 0.9), but it is not currently being actively 
developed.

Piotr Nowojski

> On Oct 2, 2017, at 3:35 PM, Antoine Philippot <antoine.philip...@teads.tv> 
> wrote:
> 
> Hi,
> 
> I'm working on a flink streaming app with a kafka09 to kafka09 use case which 
> handles around 100k messages per seconds.
> 
> To upgrade our application we used to run a flink cancel with savepoint 
> command followed by a flink run with the previous saved savepoint and the new 
> application fat jar as parameter. We notice that we can have more than 50k of 
> duplicated messages in the kafka sink wich is not idempotent.
> 
> This behaviour is actually problematic for this project and I try to find a 
> solution / workaround to avoid these duplicated messages.
> 
> The JobManager indicates clearly that the cancel call is triggered once the 
> savepoint is finished, but during the savepoint execution, kafka source 
> continue to poll new messages which will not be part of the savepoint and 
> will be replayed on the next application start.
> 
> I try to find a solution with the stop command line argument but the kafka 
> source doesn't implement StoppableFunction 
> (https://issues.apache.org/jira/browse/FLINK-3404 
> <https://issues.apache.org/jira/browse/FLINK-3404>) and the savepoint 
> generation is not available with stop in contrary to cancel.
> 
> Is there an other solution to not process duplicated messages for each 
> application upgrade or rescaling ?
> 
> If no, has someone planned to implement it? Otherwise, I can propose a pull 
> request after some architecture advices.
> 
> The final goal is to stop polling source and trigger a savepoint once polling 
> stopped.
> 
> Thanks

Reply via email to