... of course, once the tuple is acked, it won't be replayed anymore :) 2016-01-21 9:25 GMT+03:00 Yury Ruchin <[email protected]>:
> Disclaimer: the below only applies to KafkaSpout class. It's not about > TridentKafkaSpout which should go to greater length ensuring exactly-once > semantics - I'm not yet familiar with its implementation details. > > There is nothing special about Kafka cluster in this case. As for > KafkaSpout, it will remember tuples it fetched in an in-memory Map. After > ack() is received for a tuple, it is removed from the remembed pending > tuples. Every "stateUpdateIntervalMs" it commits to ZK the offset that it > has received acks up to. > > Now, if spout crashes, it does not store offsets to ZK and everything > that's pending (including tuples on wire) will be replayed when the spout > is reinstated later. It will re-read latest saved offsets from ZK, re-fetch > and re-emit lost messages. This leads to at-least-once semantics, since > some messages that have not been acked might have been actually processed > by downstream topology, with only ack() notification lost because of the > crash. > > If KafkaSpout receives fail() from downstream topology, it will keep > replaying the failed tuple until it falls behind the latest offset more > than "maxOffsetBehind" messages. >
