... of course, once the tuple is acked, it won't be replayed anymore :)

2016-01-21 9:25 GMT+03:00 Yury Ruchin <[email protected]>:

> Disclaimer: the below only applies to KafkaSpout class. It's not about
> TridentKafkaSpout which should go to greater length ensuring exactly-once
> semantics - I'm not yet familiar with its implementation details.
>
> There is nothing special about Kafka cluster in this case. As for
> KafkaSpout, it will remember tuples it fetched in an in-memory Map. After
> ack() is received for a tuple, it is removed from the remembed pending
> tuples. Every "stateUpdateIntervalMs" it commits to ZK the offset that it
> has received acks up to.
>
> Now, if spout crashes, it does not store offsets to ZK and everything
> that's pending (including tuples on wire) will be replayed when the spout
> is reinstated later. It will re-read latest saved offsets from ZK, re-fetch
> and re-emit lost messages. This leads to at-least-once semantics, since
> some messages that have not been acked might have been actually processed
> by downstream topology, with only ack() notification lost because of the
> crash.
>
> If KafkaSpout receives fail() from downstream topology, it will keep
> replaying the failed tuple until it falls behind the latest offset more
> than "maxOffsetBehind" messages.
>

Reply via email to