Thanks for your response Michael. In step 3, I am actually stopping the entire cluster and restarting it without the 2nd broker. But I see your point. When i look in /tmp/kafka-logs-2 ( which is the log dir for the 2nd broker ) I see it holds test2-1 ( ie 1st partition of test2 topic ). For /tmp/kafka-logs ( which is the log dir for the first broker ) I see it holds test2-0 and test2-2 ( 0th and 2nd partition of test2 topic ). So it would seem that kafka is missing the leader for partition 1 and hence throwing the exception on the producer side. Let me try your replication suggestion.
While all of the above might explain the exception in the case of 2 brokers, there are still times when I see it with just a single broker. In this case, I start from a normal working cluster with 1 broker only. Then I either put my machine into sleep/hibernation. On wake, I do shutdown the cluster ( for sanity ) and restart. On restart, I start seeing this exception. In this case i only have one broker. I still create the topic the way i described earlier. I understand this is not the ideal production topology, but its annoying to see it during development. Thanks On Wed, Jun 11, 2014 at 1:40 PM, Michael G. Noll <mich...@michael-noll.com> wrote: > Prakash, > > you are configure the topic with a replication factor of only 1, i.e. no > additional replica beyond "the original one". This replication setting > of 1 means that only one of the two brokers will ever host the (single) > replica -- which is implied to also be the leader in-sync replica -- of > a given partition. > > In step 3 you are disabling one of the two brokers. Because this > stopped broker is the only broker that hosts one or more of the 3 > partitions you configured (I can't tell which partition(s) it is, but > you can find out by --describe'ing the topic), your Kafka cluster -- > which is now running in degraded state -- will miss the leader of those > affected partitions. And because you set the replication factor to 1, > the remaining, second broker will not and will never take over the > leadership of those partitions from the stopped broker. Hence you will > keep getting the LeaderNotAvailableException's until you restart the > stopped broker in step 7. > > So to me it looks as if the behavior of Kafka is actually correct and as > expected. > > If you want to "rectify" your test setup, try increasing the replication > factor from 1 to 2. If you do, you should be able to go through steps > 1-8 without seeing LeaderNotAvailableExceptions (you may need to give > Kafka some time to re-elect the remaining, second broker as the new > leader for the first broker's partitions though). > > Hope this helps, > Michael > > > > On 06/11/2014 07:49 PM, Prakash Gowri Shankor wrote: > > yes, > > here are the steps: > > > > Create topic as : ./kafka-topics.sh --topic test2 --create > --partitions 3 > > --zookeeper localhost:2181 --replication-factor 1 > > > > 1) Start cluster with 2 brokers, 3 consumers. > > 2) Dont start any producer > > 3) Shutdown cluster and disable one broker from starting > > 4) restart cluster with 1 broker, 3 consumers > > 5) Start producer and send messages. I see this exception > > 6) Shutdown cluster. > > 7) Enable 2nd broker. > > 8) Restart cluster with 2 brokers, 3 consumer and the one producer and > send > > messages. Now I dont see the exception. > >