We have a topology that has 4 instances of KafkaSpout running as part of the 
same consumer group. Currently, we are only running one worker process per 
topology, so all of the spout instances are in the same process. We have 
noticed that when a process crashes/is killed, the KafkaSpout picks up from the 
last committed offset when it restarts, not the most recent offset. This is 
supported when looking in the code: 
https://github.com/apache/storm/blob/v1.2.3/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L255
 If isOffsetCommittedByThisTopology is true, then the KafkaConsumer seeks to 
the last committed offset. To me, this is behaving like UNCOMMITTED_LATEST 
should. I guess the question is why does isOffsetCommittedByThisTopology return 
true if the topology just crashed. Shouldn't the new topology instance be 
treated separately from the old one?

Reply via email to