Ben Stopford created KAFKA-2909:
-----------------------------------

             Summary: Another Instance of Gap in Consumption after Restart
                 Key: KAFKA-2909
                 URL: https://issues.apache.org/jira/browse/KAFKA-2909
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Ben Stopford


This seems very similar to Rajini's reported KAFAK-2891

*Context*
The context is Seurity Rolling Upgrade with 30s consumer timeout. There was a 
2s sleep between restarts. Throughput was limited to 1000 messages per second. 

*Failure*
At least one acked message did not appear in the consumed messages. 
acked_minus_consumed: set(36802, 36804, 36805, 36807, 36808, 36810, 36811, 
64403, 64406, 64409, 36799)


Missing data was correctly written to Kafka data files:
{quote}
value 36802 -> partition 1,offset: 12216

kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files 
worker7/kafka-data-logs/test_topic-1/00000000000000000000.log | grep 'offset: 
12216'

-> offset: 12216 position: 374994 isvalid: true payloadsize: 5 magic: 0 
compresscodec: NoCompressionCodec crc: 3001177408

in all three data files. So the data is there. 
{quote}

The first missing value was written at: 20:42:30,185, which is around the time 
the third node goes down. 

The failed writes correlate with the consumer logging out 
NOT_COORDINATOR_FOR_GROUP and Marking the coordinator. There are many of these 
messages though over a long period so it’s hard to infer this as being the 
cause or specifically correlating with the error. 

*Timeline*
{quote}
grep -r 'shutdown complete' *
20:42:06,132 - Node 1 shutdown completed 
20:42:18,560 - Node 2 shutdown completed 
20:42:30,185 - *Writes that never make it are written by producer*
20:42:31,164 - Node 3 shutdown completed 
20:42:57,872 - Node 1 shutdown completed 
…
{quote}

All logs for this incident are attached




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to