Re: LeaderNotAvailableException in 0.8.1.1

Michael G. Noll Wed, 11 Jun 2014 21:28:19 -0700

In your second case (1-broker cluster and putting your laptop to sleep) these 
exceptions should be transient and disappear after a while.


In the logs you should see ZK session expirations (hence the initial/transient 
exceptions, which in this case are expected and ok), followed by new ZK 
sessions being established.

So this case is (should?) be very different from your case number 1.

--Michael


> On 11.06.2014, at 23:13, Prakash Gowri Shankor <prakash.shan...@gmail.com> 
> wrote:
> 
> Thanks for your response Michael.
> 
> In step 3, I am actually stopping the entire cluster and restarting it
> without the 2nd broker. But I see your point. When i look in
> /tmp/kafka-logs-2 ( which is the log dir for the 2nd broker ) I see it
> holds test2-1 ( ie 1st partition of test2 topic ).
> For /tmp/kafka-logs ( which is the log dir for the first broker ) I see it
> holds test2-0 and test2-2 ( 0th and 2nd partition of test2 topic ).
> So it would seem that kafka is missing the leader for partition 1 and hence
> throwing the exception on the producer side.
> Let me try your replication suggestion.
> 
> While all of the above might explain the exception in the case of 2
> brokers, there are still times when I see it with just a single broker.
> In this case, I start from a normal working cluster with 1 broker only.
> Then I either put my machine into sleep/hibernation. On wake, I do shutdown
> the cluster ( for sanity ) and restart.
> On restart, I start seeing this exception. In this case i only have one
> broker. I still create the topic the way i described earlier.
> I understand this is not the ideal production topology, but its annoying to
> see it during development.
> 
> Thanks
> 
> 
> On Wed, Jun 11, 2014 at 1:40 PM, Michael G. Noll <mich...@michael-noll.com>
> wrote:
> 
>> Prakash,
>> 
>> you are configure the topic with a replication factor of only 1, i.e. no
>> additional replica beyond "the original one".  This replication setting
>> of 1 means that only one of the two brokers will ever host the (single)
>> replica -- which is implied to also be the leader in-sync replica -- of
>> a given partition.
>> 
>> In step 3 you are disabling one of the two brokers.  Because this
>> stopped broker is the only broker that hosts one or more of the 3
>> partitions you configured (I can't tell which partition(s) it is, but
>> you can find out by --describe'ing the topic), your Kafka cluster --
>> which is now running in degraded state -- will miss the leader of those
>> affected partitions.  And because you set the replication factor to 1,
>> the remaining, second broker will not and will never take over the
>> leadership of those partitions from the stopped broker.  Hence you will
>> keep getting the LeaderNotAvailableException's until you restart the
>> stopped broker in step 7.
>> 
>> So to me it looks as if the behavior of Kafka is actually correct and as
>> expected.
>> 
>> If you want to "rectify" your test setup, try increasing the replication
>> factor from 1 to 2.  If you do, you should be able to go through steps
>> 1-8 without seeing LeaderNotAvailableExceptions (you may need to give
>> Kafka some time to re-elect the remaining, second broker as the new
>> leader for the first broker's partitions though).
>> 
>> Hope this helps,
>> Michael
>> 
>> 
>> 
>>> On 06/11/2014 07:49 PM, Prakash Gowri Shankor wrote:
>>> yes,
>>> here are the steps:
>>> 
>>> Create topic as : ./kafka-topics.sh  --topic test2 --create
>> --partitions 3
>>> --zookeeper localhost:2181 --replication-factor 1
>>> 
>>> 1) Start cluster with 2 brokers, 3 consumers.
>>> 2) Dont start any producer
>>> 3) Shutdown cluster and disable one broker from starting
>>> 4) restart cluster with 1 broker, 3 consumers
>>> 5) Start producer and send messages. I see this exception
>>> 6) Shutdown cluster.
>>> 7) Enable 2nd broker.
>>> 8) Restart cluster with 2 brokers, 3 consumer and the one producer and
>> send
>>> messages. Now I dont see the exception.
>> 
>>

Re: LeaderNotAvailableException in 0.8.1.1

Reply via email to