Re: Painfully slow kafka recovery / cluster breaking

Rajasekar Elango Wed, 26 Aug 2015 07:15:03 -0700

Thanks for updates Jörg. It's very useful.

Thanks,
Raja.


On Wed, Aug 26, 2015 at 8:58 AM, Jörg Wagner <joerg.wagn...@1und1.de> wrote:

> Just a little feedback on our issue(s) as FYI to whoever is interested.
>
> It basically all boiled down to the configuration of topics. We noticed
> while performance testing (or trying to ;) ) that the partitioning was most
> critical to us.
>
> We originally followed the linkedin recommendation and used 600 partitions
> for our main topic. Testing that, the replicas always went out of sync
> within a short timeframe, leaders could not be determined and the cluster
> failed horribly (even writing several hundred lines of logs within a
> 1/100th second).
>
> So for our 27 log.dirs (= disks) we went with 27 partitions. And voilá: we
> could use kafka with around 35k requests per second (via an application
> accessing it). Kafka stayed stable.
>
> Currently we are testing with 81 partitions (27*3) and it's running well.
> No issues so far, replicas in sync and up to 50k requests per second.
>
> Cheers
>
> On 25.08.2015 15:18, Jörg Wagner wrote:
>
>> So okay, this is a little embarassing but the core of the issue was that
>> max open files was not set correctly for kafka. It was not an oversight,
>> but a few things together caused that the system configuration was not
>> changed correctly, resulting in the default value.
>>
>> No wonder that kafka behaved strangely everytime we had enough data in
>> log.dirs and connections.
>>
>> Anyhow, that doesn't seem to be the last problem. The brokers get in sync
>> with each other (within an expected time frame), everything seems fine.
>>
>> After a little stress testing, the kafka cluster falls apart (around 40k
>> requests/s). Using topics describe we can see leaders missing (e.g. from
>> 1,2,3 only 1 and 3 are leading partitions, although zookeeper lists all
>> under /brokers/ids). This ultimately results in partitions being
>> unavailable and massive "leader not local" spam in the logs.
>>
>> What are we missing?
>>
>> Cheers
>> Jörg
>>
>> On 24.08.2015 10:31, Jörg Wagner wrote:
>>
>>> Thank you for your answers.
>>>
>>> @Raja
>>> No, it also seems to happen if we stop kafka completely clean.
>>>
>>> @Gwen
>>> I was testing the situation with num.replica.fetchers set higher. If you
>>> say that was the right direction, I will try it again. What would be a good
>>> setting? I went with 50 which seemed reasonable (having 27 single disks).
>>> How long should it take to get complete ISR?
>>>
>>> Regarding no Data flowing into kafka: I just wanted to point out that
>>> the setup is not yet live. So we can completely stop the usage of kafka,
>>> and it should possibly get into sync faster without a steady stream of new
>>> messages.
>>> Kafka itself is working fine during this on the other hand, "just"
>>> missing ISR, hence redundancy. If I stop another broker (clean!) though, it
>>> tends to happen that the expected number of partitions have Leader -1;
>>> which should not happen as I assume.
>>>
>>> Cheers
>>> Jörg
>>>
>>> On 21.08.2015 19:18, Rajasekar Elango wrote:
>>>
>>>> We are seeing same behavior in 5 broker cluster when losing one broker.
>>>>
>>>> In our case, we are losing broker as well as kafka data dir.
>>>>
>>>> Jörg Wagner,
>>>>
>>>> Are you losing just broker or kafka data dir as well?
>>>>
>>>> Gwen,
>>>>
>>>> We have also observed that latency of messages arriving at consumers
>>>> goes
>>>> up by 10x when we lose a broker. Is it because the broker is busy with
>>>> handling failed fetch requests and loaded with more data thats slowing
>>>> down
>>>> the writes ? Also, if we had simply lost the broker not the data dir,
>>>> impact would have been minimal?
>>>>
>>>> Thanks,
>>>> Raja.
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 12:31 PM, Gwen Shapira <g...@confluent.io>
>>>> wrote:
>>>>
>>>> By default, num.replica.fetchers = 1. This means only one thread per
>>>>> broker
>>>>> is fetching data from leaders. This means it make take a while for the
>>>>> recovering machine to catch up and rejoin the ISR.
>>>>>
>>>>> If you have bandwidth to spare, try increasing this value.
>>>>>
>>>>> Regarding "no data flowing into kafka" - If you have 3 replicas and
>>>>> only
>>>>> one is down, I'd expect writes to continue to the new leader even if
>>>>> one
>>>>> replica is not in the ISR yet. Can you see that a new leader is
>>>>> elected?
>>>>>
>>>>> Gwen
>>>>>
>>>>> On Fri, Aug 21, 2015 at 6:50 AM, Jörg Wagner <joerg.wagn...@1und1.de>
>>>>> wrote:
>>>>>
>>>>> Hey everyone,
>>>>>>
>>>>>> here's my crosspost from irc.
>>>>>>
>>>>>> Our setup:
>>>>>> 3 kafka 0.8.2 brokers with zookeeper, powerful hardware (20 cores, 27
>>>>>> logdisks each). We use a handful of topics, but only one topic is
>>>>>>
>>>>> utilized
>>>>>
>>>>>> heavily. It features a replication of 2 and 600 partitions.
>>>>>>
>>>>>> Our issue:
>>>>>> If one kafka was down, it takes very long ( from 1 to >10 hours) to
>>>>>> show
>>>>>> that all partitions have all isr again. This seems to heavily depend
>>>>>> on
>>>>>>
>>>>> the
>>>>>
>>>>>> amount of data which is in the log.dirs (I have configured 27 threads
>>>>>> -
>>>>>>
>>>>> one
>>>>>
>>>>>> for each dir featuring a own drive).
>>>>>> This all takes this long while there is NO data flowing into kafka.
>>>>>>
>>>>>> We seem to be missing something critical here. It might be some option
>>>>>>
>>>>> set
>>>>>
>>>>>> wrong, or are we thinking wrong and it's not critical to have the
>>>>>>
>>>>> replicas
>>>>>
>>>>>> in sync.
>>>>>>
>>>>>> Any pointers would be great.
>>>>>>
>>>>>> Cheers
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
> --
> Mit freundlichem Gruß
>
> Jörg Wagner
>
>  Mobile & Services
>
> 1&1 Internet AG | Sapporobogen 6-8 | 80637 München | Germany
> Phone: +49 89 14339 324
> E-Mail: joerg.wagn...@1und1.de | Web: www.1und1.de
>
> Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484
>
> Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas
> Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen,
> Christian Würst
> Aufsichtsratsvorsitzender: Michael Scheeren
>
> Member of United Internet
>
>


-- 
Thanks,
Raja.

Re: Painfully slow kafka recovery / cluster breaking

Reply via email to