It should do a updateMetadataRequest in case it gets NOT_LEADER_FOR PARTITION. This looks like a bug.
Thanks, Mayuresh On Fri, May 8, 2015 at 8:53 AM, Dan <[email protected]> wrote: > Hi, > > We've noticed an issue on our staging environment where all 3 of our Kafka > hosts shutdown and came back with a different ip -> broker id mapping. I > know this is not good and we're fixing that separately. But what we noticed > is all the consumers recovered but the producers got stuck with the > following exceptions: > > WARN 2015-05-08 09:19:56,347 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544968 on topic-partition > samza-metrics-0, retrying (2145750068 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,448 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544970 on topic-partition > samza-metrics-0, retrying (2145750067 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,549 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544972 on topic-partition > samza-metrics-0, retrying (2145750066 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,649 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544974 on topic-partition > samza-metrics-0, retrying (2145750065 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,749 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544976 on topic-partition > samza-metrics-0, retrying (2145750064 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,850 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544978 on topic-partition > samza-metrics-0, retrying (2145750063 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:56,949 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544980 on topic-partition > samza-metrics-0, retrying (2145750062 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:57,049 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544982 on topic-partition > samza-metrics-0, retrying (2145750061 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:57,150 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544984 on topic-partition > samza-metrics-0, retrying (2145750060 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:57,254 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544986 on topic-partition > samza-metrics-0, retrying (2145750059 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:57,351 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544988 on topic-partition > samza-metrics-0, retrying (2145750058 attempts left). Error: > NOT_LEADER_FOR_PARTITION > WARN 2015-05-08 09:19:57,454 > org.apache.kafka.clients.producer.internals.Sender: Got error produce > response with correlation id 3544990 on topic-partition > samza-metrics-0, retrying (2145750057 attempts left). Error: > NOT_LEADER_FOR_PARTITION > > > So it appears as if the producer did not refresh the metadata once the > brokers had come back up. The exceptions carried on for a few hours until > we restarted them. > > We noticed this in both 0.8.2.1 Java clients and via, Kakfa-rest > https://github.com/confluentinc/kafka-rest which is using 0.8.2.0-cp. > > Is this a known issue when all brokers go away, or is it a subtle bug we've > hit? > > Thanks, > Dan > -- -Regards, Mayuresh R. Gharat (862) 250-7125
