We have a topology that has 4 instances of KafkaSpout running as part of the same consumer group. Currently, we are only running one worker process per topology, so all of the spout instances are in the same process. We have noticed that when a process crashes/is killed, the KafkaSpout picks up from the last committed offset when it restarts, not the most recent offset. This is supported when looking in the code: https://github.com/apache/storm/blob/v1.2.3/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpout.java#L255 If isOffsetCommittedByThisTopology is true, then the KafkaConsumer seeks to the last committed offset. To me, this is behaving like UNCOMMITTED_LATEST should. I guess the question is why does isOffsetCommittedByThisTopology return true if the topology just crashed. Shouldn't the new topology instance be treated separately from the old one?
FirstPollOffsetStrategy LATEST not working as expected
Mitchell Rathbun (BLOOMBERG/ 731 LEX) Tue, 05 Nov 2019 13:59:39 -0800
- FirstPollOffsetStrategy LATEST not w... Mitchell Rathbun (BLOOMBERG/ 731 LEX)
- Re: FirstPollOffsetStrategy LAT... Stig Rohde Døssing
