Thanks for updates Jörg. It's very useful. Thanks, Raja.
On Wed, Aug 26, 2015 at 8:58 AM, Jörg Wagner <joerg.wagn...@1und1.de> wrote: > Just a little feedback on our issue(s) as FYI to whoever is interested. > > It basically all boiled down to the configuration of topics. We noticed > while performance testing (or trying to ;) ) that the partitioning was most > critical to us. > > We originally followed the linkedin recommendation and used 600 partitions > for our main topic. Testing that, the replicas always went out of sync > within a short timeframe, leaders could not be determined and the cluster > failed horribly (even writing several hundred lines of logs within a > 1/100th second). > > So for our 27 log.dirs (= disks) we went with 27 partitions. And voilá: we > could use kafka with around 35k requests per second (via an application > accessing it). Kafka stayed stable. > > Currently we are testing with 81 partitions (27*3) and it's running well. > No issues so far, replicas in sync and up to 50k requests per second. > > Cheers > > On 25.08.2015 15:18, Jörg Wagner wrote: > >> So okay, this is a little embarassing but the core of the issue was that >> max open files was not set correctly for kafka. It was not an oversight, >> but a few things together caused that the system configuration was not >> changed correctly, resulting in the default value. >> >> No wonder that kafka behaved strangely everytime we had enough data in >> log.dirs and connections. >> >> Anyhow, that doesn't seem to be the last problem. The brokers get in sync >> with each other (within an expected time frame), everything seems fine. >> >> After a little stress testing, the kafka cluster falls apart (around 40k >> requests/s). Using topics describe we can see leaders missing (e.g. from >> 1,2,3 only 1 and 3 are leading partitions, although zookeeper lists all >> under /brokers/ids). This ultimately results in partitions being >> unavailable and massive "leader not local" spam in the logs. >> >> What are we missing? >> >> Cheers >> Jörg >> >> On 24.08.2015 10:31, Jörg Wagner wrote: >> >>> Thank you for your answers. >>> >>> @Raja >>> No, it also seems to happen if we stop kafka completely clean. >>> >>> @Gwen >>> I was testing the situation with num.replica.fetchers set higher. If you >>> say that was the right direction, I will try it again. What would be a good >>> setting? I went with 50 which seemed reasonable (having 27 single disks). >>> How long should it take to get complete ISR? >>> >>> Regarding no Data flowing into kafka: I just wanted to point out that >>> the setup is not yet live. So we can completely stop the usage of kafka, >>> and it should possibly get into sync faster without a steady stream of new >>> messages. >>> Kafka itself is working fine during this on the other hand, "just" >>> missing ISR, hence redundancy. If I stop another broker (clean!) though, it >>> tends to happen that the expected number of partitions have Leader -1; >>> which should not happen as I assume. >>> >>> Cheers >>> Jörg >>> >>> On 21.08.2015 19:18, Rajasekar Elango wrote: >>> >>>> We are seeing same behavior in 5 broker cluster when losing one broker. >>>> >>>> In our case, we are losing broker as well as kafka data dir. >>>> >>>> Jörg Wagner, >>>> >>>> Are you losing just broker or kafka data dir as well? >>>> >>>> Gwen, >>>> >>>> We have also observed that latency of messages arriving at consumers >>>> goes >>>> up by 10x when we lose a broker. Is it because the broker is busy with >>>> handling failed fetch requests and loaded with more data thats slowing >>>> down >>>> the writes ? Also, if we had simply lost the broker not the data dir, >>>> impact would have been minimal? >>>> >>>> Thanks, >>>> Raja. >>>> >>>> >>>> >>>> On Fri, Aug 21, 2015 at 12:31 PM, Gwen Shapira <g...@confluent.io> >>>> wrote: >>>> >>>> By default, num.replica.fetchers = 1. This means only one thread per >>>>> broker >>>>> is fetching data from leaders. This means it make take a while for the >>>>> recovering machine to catch up and rejoin the ISR. >>>>> >>>>> If you have bandwidth to spare, try increasing this value. >>>>> >>>>> Regarding "no data flowing into kafka" - If you have 3 replicas and >>>>> only >>>>> one is down, I'd expect writes to continue to the new leader even if >>>>> one >>>>> replica is not in the ISR yet. Can you see that a new leader is >>>>> elected? >>>>> >>>>> Gwen >>>>> >>>>> On Fri, Aug 21, 2015 at 6:50 AM, Jörg Wagner <joerg.wagn...@1und1.de> >>>>> wrote: >>>>> >>>>> Hey everyone, >>>>>> >>>>>> here's my crosspost from irc. >>>>>> >>>>>> Our setup: >>>>>> 3 kafka 0.8.2 brokers with zookeeper, powerful hardware (20 cores, 27 >>>>>> logdisks each). We use a handful of topics, but only one topic is >>>>>> >>>>> utilized >>>>> >>>>>> heavily. It features a replication of 2 and 600 partitions. >>>>>> >>>>>> Our issue: >>>>>> If one kafka was down, it takes very long ( from 1 to >10 hours) to >>>>>> show >>>>>> that all partitions have all isr again. This seems to heavily depend >>>>>> on >>>>>> >>>>> the >>>>> >>>>>> amount of data which is in the log.dirs (I have configured 27 threads >>>>>> - >>>>>> >>>>> one >>>>> >>>>>> for each dir featuring a own drive). >>>>>> This all takes this long while there is NO data flowing into kafka. >>>>>> >>>>>> We seem to be missing something critical here. It might be some option >>>>>> >>>>> set >>>>> >>>>>> wrong, or are we thinking wrong and it's not critical to have the >>>>>> >>>>> replicas >>>>> >>>>>> in sync. >>>>>> >>>>>> Any pointers would be great. >>>>>> >>>>>> Cheers >>>>>> Jörg >>>>>> >>>>>> >>>> >>>> >>> >> > -- > Mit freundlichem Gruß > > Jörg Wagner > > Mobile & Services > > 1&1 Internet AG | Sapporobogen 6-8 | 80637 München | Germany > Phone: +49 89 14339 324 > E-Mail: joerg.wagn...@1und1.de | Web: www.1und1.de > > Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484 > > Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas > Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, > Christian Würst > Aufsichtsratsvorsitzender: Michael Scheeren > > Member of United Internet > > -- Thanks, Raja.