Re: Fault tolerance & idempotency on window functions

Aljoscha Krettek Fri, 28 Apr 2017 08:54:30 -0700

Hi,
Yes, your analysis is correct: Flink will not retry for individual elements but 
will restore from the latest consistent checkpoint in case of failure. This 
also means that you can get different window results based on which element 
arrives first, i.e. you have a different timestamp on your output in that case.


One simple mitigation for the timestamp problem is to use the largest timestamp 
of elements within a window instead of the first timestamp. This will be stable 
across restores even if the order of arrival of elements changes. You can still 
get problems when it comes to late data and window triggering, if you cannot 
guarantee that your watermark is 100 % correct, though. I.e. it might be that, 
upon restore, an element with an even larger timestamp arrives late that was 
not considered when doing the first processing that failed.

Best,
Aljoscha
> On 25. Apr 2017, at 19:54, Kamil Dziublinski <kamil.dziublin...@gmail.com> 
> wrote:
> 
> Hi guys,
> 
> I have a flink streaming job that reads from kafka, creates some statistics 
> increments and stores this in hbase (using normal puts).
> I'm using fold function here of with window of few seconds.
> 
> My tests showed me that restoring state with window functions is not exactly 
> working how I expected.
> I thought that if my window functions emits an aggregated object to a sink, 
> and that object fails in a sink, this write to hbase will be replayed. So 
> even if it actually got written to HBase, but flink thought it didnt (for 
> instance during network problem) I could be sure of idempotent writes. I 
> wanted to enforce that by using the timestamp of the first event used in that 
> window for aggregation. 
> 
> Now correct me if I'm wrong but it seems that in the case of failure (even if 
> its in sink) whole flow is getting replayed from last checkpoint which means 
> that my window function might evict aggregated object in a different form. 
> For instance not only having tuples that failed but also other ones, which 
> would break my idempotency her and I might end up with having higher counters 
> than I should have.
> 
> Do you have any suggestion on how to solve/workaround such problem in flink?
> 
> Thanks,
> Kamil.
> 
>

Re: Fault tolerance & idempotency on window functions

Reply via email to