Re: Broker Exceptions

Zakee Mon, 09 Mar 2015 12:20:13 -0700

No broker restarts.

Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 
<https://issues.apache.org/jira/browse/KAFKA-2011>


>> Logs for rebalance:
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
>> election for partitions:  (kafka.controller.KafkaController)
>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
>> preferred replica election:  (kafka.controller.KafkaController)
>> …
>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
>> election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
>> election for partitions:  (kafka.controller.KafkaController)
>> ...
>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
>> leader election for partitions  (kafka.controller.KafkaController)
>> ...
>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
>> preferred replica election:  (kafka.controller.KafkaController)
>> 
>> Also, I still see lots of below errors (~69k) going on in the logs since the 
>> restart. Is there any other reason than rebalance for these errors?
>> 
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>> partition [Topic-11,7] to broker 5:class 
>> kafka.common.NotLeaderForPartitionException 
>> (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>> partition [Topic-2,25] to broker 5:class 
>> kafka.common.NotLeaderForPartitionException 
>> (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>> partition [Topic-2,21] to broker 5:class 
>> kafka.common.NotLeaderForPartitionException 
>> (kafka.server.ReplicaFetcherThread)
>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>> partition [Topic-22,9] to broker 5:class 
>> kafka.common.NotLeaderForPartitionException 
>> (kafka.server.ReplicaFetcherThread)


>  Could you paste the related logs in controller.log?
What specifically should I search for in the logs?

Thanks,
Zakee



> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID 
> <mailto:j...@linkedin.com.INVALID>> wrote:
> 
> Is there anything wrong with brokers around that time? E.g. Broker restart?
> The log you pasted are actually from replica fetchers. Could you paste the
> related logs in controller.log?
> 
> Thanks.
> 
> Jiangjie (Becket) Qin
> 
> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net 
> <mailto:kzak...@netzero.net>> wrote:
> 
>> Correction: Actually  the rebalance happened quite until 24 hours after
>> the start, and thats where below errors were found. Ideally rebalance
>> should not have happened at all.
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net 
>>> <mailto:kzak...@netzero.net>> wrote:
>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>> Thanks for you suggestions.
>>> It looks like the rebalance actually happened only once soon after I
>>> started with clean cluster and data was pushed, it didn’t happen again
>>> so far, and I see the partitions leader counts on brokers did not change
>>> since then. One of the brokers was constantly showing 0 for partition
>>> leader count. Is that normal?
>>> 
>>> Also, I still see lots of below errors (~69k) going on in the logs
>>> since the restart. Is there any other reason than rebalance for these
>>> errors?
>>> 
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-11,7] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-2,25] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>> partition [Topic-2,21] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>> partition [Topic-22,9] to broker 5:class
>>> kafka.common.NotLeaderForPartitionException
>>> (kafka.server.ReplicaFetcherThread)
>>> 
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>> Yes 
>>> 
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>> ls /admin
>>> [delete_topics]
>>> ls /admin/preferred_replica_election
>>> Node does not exist: /admin/preferred_replica_election
>>> 
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>> <mailto:j...@linkedin.com.INVALID>>
>>>> wrote:
>>>> 
>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>> here?
>>>> Some other things to check are:
>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>> confirm.
>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>> does not exist?
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net 
>>>> <mailto:kzak...@netzero.net>> wrote:
>>>> 
>>>>> I started with  clean cluster and started to push data. It still does
>>>>> the
>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>> is
>>>>> set to false.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>>>> <mailto:j...@linkedin.com.INVALID>>
>>>>>> wrote:
>>>>>> 
>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>> bit
>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>> auto.leader.election disabled and try push data?
>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>> expected.
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> 
>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net 
>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>> 
>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>> replica
>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>> 
>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>> still
>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>> will
>>>>>>> not happen when this is set to false.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>> Increasing
>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>> large
>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>> threads
>>>>>>>> will
>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>> case,
>>>>>>>> it looks that leader migration cause issue.
>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>> election?
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>> <mailto:kzak...@netzero.net>
>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>> 
>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>> Anythings
>>>>>>>>> that
>>>>>>>>> I could try to reduce it?
>>>>>>>>> 
>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>> have
>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>> 
>>>>>>>>> -Zakee
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid>>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>> long
>>>>>>>>>> as
>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>> replicated
>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>> 
>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>> sure
>>>>>>>>>> what
>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>> 
>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>> [TestTopic]
>>>>>>>>>>> to
>>>>>>>>>>> broker
>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>> Error
>>>>>>>>>>> for
>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>> Fetch
>>>>>>>>>>> request
>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>> on
>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>> partition
>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Any ideas?
>>>>>>>>>>> 
>>>>>>>>>>> -Zakee
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 
>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>> st0
>>>>>>>>>>> 3v
>>>>>>>>>>> uc
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Extended Stay America
>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>> Free
>>>>>>>>>> WIFI
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m 
>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>> p02
>>>>>>>>>> du
>>>>>>>>>> c
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Extended Stay America
>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>> guaranteed.
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d 
>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>> uc
>>>>>>>> 
>>>>>>>> 
>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>> duc
>>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> The WORST exercise for aging
>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>> YOUNGER
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d 
>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>> uc
>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> Seabourn Luxury Cruises
>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>> Line!
>>>> 
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc 
>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>> 
>> 
> 
> 
> ____________________________________________________________
> Discover Seabourn
> A journey as beautiful as the destination, request a brochure today!
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc 
> <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>


Thanks
Zakee

Re: Broker Exceptions

Reply via email to