Re: Disk-size aware partitioning

2018-10-09 Thread Brett Rann
LInkedin's cruise-control https://github.com/linkedin/cruise-control has numerous goals, including disk, network, cpu, rack awareness, leadership distribution etc. You can have separate disk/network limits per broker (ours are all the same fwiw) We use it and it does a stellar job of keeping a

Re: Problems with broker upgrade from 1.1.0 to 2.0.0

2018-10-08 Thread Brett Rann
I'd * verify you shutdown cleanly when restarting the broker * test restarting the broker first before upgrading it (it may have been sitting on a corrupt file for a long time and the issue is unrelated to the upgrade) re your questions: 1) probably not, unless it was caused by an unclean

Re: kafka-2.0.0-src/bin/kafka-topics.sh The shell file is empty;

2018-10-03 Thread Brett Rann
Looks fine to me: from: https://www.apache.org/dyn/closer.cgi?path=/kafka/2.0.0/kafka_2.11-2.0.0.tgz tmp/kafka_2.11-2.0.0/bin% head -2 kafka-topics.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more Also from this src it's fine in there too:

Re: manually trigger log compaction

2018-10-03 Thread Brett Rann
contain information that is privileged, confidential and/or > proprietary > > and subject to important terms and conditions available at > > http://www.bankofamerica.com/emaildisclaimer > <http://www.bankofamerica.com/emaildisclaimer>. If you are > not the > > in

Re: Kafka consumer offset topic deletion

2018-09-18 Thread Brett Rann
That's unusually large. Ours are around 32k-90mb each. Initially curious if you have log.cleaner.enable=true and what offsets.retention.minutes is set to. And yes it can affect cluster performance. We had instances of consumer outages that were caused by bugged large consumer offfset files,

Re: Big Log Retention

2018-09-18 Thread Brett Rann
join. > > Maybe this was a case in which the partition reassignment CLI tool would > have been useful? > > On Thu, Sep 6, 2018 at 11:09 PM Brett Rann > wrote: > > > We have partitions that are in the 100s of GBs. > > > > It shouldn't have to shuffle around GB chunks of d

Re: Reduce number of brokers?

2018-09-17 Thread Brett Rann
You need to do a partition reassignment to increase or decrease the replication factor. It's tediously manual, but it's just json so it's trivial to manipulate which is probably why it's still tediously manual. There's a guide here although it's ageing a little:

Re: Big Log Retention

2018-09-07 Thread Brett Rann
We have partitions that are in the 100s of GBs. It shouldn't have to shuffle around GB chunks of data unless you have done partition moves, or had a broker offline for a while. Is that the case? If not your ISR problem is probably related to something else other than retention size. On Fri, Sep

Re: Cleanup unused topics

2018-08-21 Thread Brett Rann
These are in regex form for DataDog's JMX collector, but it should get you started: bean_regex: 'kafka\.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=.*' bean_regex: 'kafka\.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec,topic=.*'

Re: [VOTE] 1.1.1 RC3

2018-07-11 Thread Brett Rann
e.org/job/kafka-1.1-jdk7/162 > <https://builds.apache.org/job/kafka-1.1-jdk7/162> > <https://builds.apache.org/job/kafka-1.1-jdk7/162 > <https://builds.apache.org/job/kafka-1.1-jdk7/162>>* > System tests: > https://jenkins.confluent.io/job/system-test-kafka/job/1.

Re: [VOTE] 2.0.0 RC1

2018-07-03 Thread Brett Rann
> > > > http://kafka.apache.org/20/documentation.html > <http://kafka.apache.org/20/documentation.html> > > > > > > > > * Protocol: > > > > http://kafka.apache.org/20/protocol.html > <http://kafka.apache.org/20/protocol.htm

Re: what need to be done if we increase the Kafka nodes from 3 to 5?

2018-07-03 Thread Brett Rann
They will be members of the cluster, yes, but they won't be serving any partitions unless you create new topics where partitions might be assigned to the new brokers, or run things which automatically balance partitions (kafka-monitor does that, and cruise-control can). You'll need to run

Re: How much time does it usually take for a rolling upgrade to complete?

2018-07-02 Thread Brett Rann
tween them > 2. Each broker with 4GB of RAM and a disk that writes at 35MB/s > 3. 3 partitions, each with a replication factor of 3 > 4. 20M messages (80GB) per partition > > A rolling upgrade takes about 8 hours. I suspect this is not the norm. > > Kostas > > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd

Re: Fail fast a broker

2018-06-08 Thread Brett Rann
e only choice is to >>> restart or even stop the full server, but due to operational procedures , >>> that may take some time. >>> >>> >>> Therefore, is there any configuration that could be applied for such >>> broker >>> to be "

Re: Kafka behaviour on contolled shutdown of brokers

2018-04-23 Thread Brett Rann
i) no. ii) yes you do, no it won't. :) You used the word replace, not add. Is the final state of your cluster 6 nodes, or 12? If you're replacing, you might want to consider just replacing one at a time to avoid having to do reassignments: a) stop 1 broker, lets say broker "1". b) start up the

Re: Unavailable partitions after upgrade to kafka 1.0.0

2018-04-23 Thread Brett Rann
its logs. And if you were following the rolling upgrade method correctly it was very likely part way through it? On Mon, Apr 23, 2018 at 5:42 PM, Mika Linnanoja <mika.linnan...@rovio.com> wrote: > Hi, > > On Mon, Apr 23, 2018 at 10:25 AM, Brett Rann <br...@zendesk.com.inva

Re: Kafka rebalancing behavior on broker failures

2018-04-23 Thread Brett Rann
partitions are never automatically moved. They are assigned to broker(s) and stay that way unless reassignments are triggered by external tools. (leadership can move automatically though, if RF>1). There's more info on partitions and moving partitions at these two links:

Re: Unavailable partitions after upgrade to kafka 1.0.0

2018-04-23 Thread Brett Rann
s, mostly by secor. > > BR, > Mika > > -- > *Mika Linnanoja* > Senior Cloud Engineer > Games Technology > Rovio Entertainment Corp > Keilaranta 7 > <https://maps.google.com/?q=Keilaranta+7=gmail=g>, FIN - > 02150 Espoo, Finland > mika.linnan...@rovio.com > www.ro

Re: [VOTE] 1.1.0 RC4

2018-03-27 Thread Brett Rann
Release artifacts to be voted upon (source and binary): > > > > > > > > > > http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/ > <http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/> > > > > > > > > > > > > > > > *

Re: Question about KIP-107

2018-03-19 Thread Brett Rann
gt; <https://kafka.apache.org/10/javadoc/index.html?org/apache/ > kafka/clients/admin/AdminClient.html > <https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html>>. > > > is this feature available? > > thanks, > Jason > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017

Re: Kafka/zookeeper logs in every command

2018-01-28 Thread Brett Rann
the command line tools should be using config/tools-log4j.properties which should have log level set to WARN, not INFO config/log4j.properties is used by the broker and by default is set to INFO bin/kafka-run-class.sh will default to the first one if KAFKA_LOG4J_OPTS isn't set. # Log4j settings

Re: [ANNOUNCE] New Kafka PMC Member: Rajini Sivaram

2018-01-17 Thread Brett Rann
Congratulations Rajini! On Thu, Jan 18, 2018 at 9:23 AM, Konstantine Karantasis < konstant...@confluent.io> wrote: > Congrats Rajini! > > -Konstantine > > On Wed, Jan 17, 2018 at 2:18 PM, Becket Qin wrote: > > > Congratulations, Rajini! > > > > On Wed, Jan 17, 2018 at 1:52

Re: Kafka Replication Factor

2018-01-17 Thread Brett Rann
if RF=2 and min.insync.replicas=1 (the default) then you shouldn't have offline partitions if 1 of 3 brokers is down. I'd first double check your topic config (and broker defaults) for the one that went offline to verify RF/Min. Be sure to check each partition they can be different! (

Re: __consumer_offsets too big

2018-01-17 Thread Brett Rann
There are several bugs in 0.9 around consumer offsets and compaction and log cleaning. The easiest path forward is to upgrade to the latest 0.11.x. We ended up going to somewhat extreme lengths to deal with 100GB+ consumer offsets. When we tested an upgrade we noticed that when it started

Re: Kafka 1.0 upgrade

2018-01-11 Thread Brett Rann
we run a 1.0.1 prerelease in production just fine, but the scale is smaller. 20+ clusters with 3-10 brokers each, each cluster with about 120 topics and about 15k partitions. We have unusual messages sizes, so peaks of around 40k messages, 60MB in, 400MB out, per sec in the largest one. we run a

Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-05 Thread Brett Rann
What do the broker logs say its doing during all that time? There are some consumer offset / log cleaner bugs which caused us similarly log delays. that was easily visible by watching the log cleaner activity in the logs, and in our monitoring of partition sizes watching them go down, along with

Re: kafka 1.0.1?

2018-01-05 Thread Brett Rann
Is there a plan/eta for this? > On 19 Dec 2017, at 08:49, Ismael Juma wrote: > > Hi Maciek, > > I expect that 1.0.1 will be released some time in January. > > Ismael > > On Mon, Dec 18, 2017 at 10:42 AM, Maciek Próchniak wrote: > > > Hello, > > > > are

Re: Consumer behavior when Kafka rolls the old log file

2018-01-02 Thread Brett Rann
The old segment isn't rolled, a new one is. eg, take this partition directory: drwxr-xr-x2 kafka kafka 4096 Jan 1 10:31 . drwxr-xr-x 1483 kafka kafka94208 Jan 3 02:11 .. -rw-r--r--1 kafka kafka15504 Jan 1 10:31 04050355.index -rw-r--r--1 kafka kafka 8210576

Re: Partition reassignment data file is empty

2017-12-31 Thread Brett Rann
nd.scala:188) > at > kafka.admin.ReassignPartitionsCommand$.executeAssignment( > ReassignPartitionsCommand.scala:158) > at > kafka.admin.ReassignPartitionsCommand$.executeAssignment( > ReassignPartitionsCommand.scala:154) > at > kafka.admin.ReassignPartitionsCommand$.main(ReassignPar

(solved) Re: kafka 1.0/0.11.0.1 log message upgrade: Error processing fetch operation on partition __consumer_offsets-21 offset 200349244

2017-12-16 Thread Brett Rann
} ] } ] } Anyway, easily fixed with restarts of Burrow, once it was clear which consumer was having the issue. On Fri, Dec 15, 2017 at 5:04 PM, Brett Rann <br...@zendesk.com> wrote: > Another interesting datapoint: > > Taking a deeper look at partition 21: > &g

Re: Queries on Kafka Capacity

2017-12-15 Thread Brett Rann
You would add new brokers to the cluster, and then do a partition reassignment to move some partitions to the new broker. In the simplest example: Say you have 1 topic with 3 partitions. partition 0: brokers: 1,2 partition 1: brokers: 2,3 partition 2: brokers: 3,1 If you added 3 more brokers,

Re: kafka 1.0/0.11.0.1 log message upgrade: Error processing fetch operation on partition __consumer_offsets-21 offset 200349244

2017-12-15 Thread Brett Rann
3 log files. it's not there.) Why is it trying 201167266? Is it from the snapshot files? Is there some surgery we can do to make it stop. Safely? :) On Fri, Dec 15, 2017 at 4:33 PM, Brett Rann <br...@zendesk.com> wrote: > on `kafka_2.11-1.0.1-d04daf570` we are upgrading the log fo

kafka 1.0/0.11.0.1 log message upgrade: Error processing fetch operation on partition __consumer_offsets-21 offset 200349244

2017-12-15 Thread Brett Rann
on `kafka_2.11-1.0.1-d04daf570` we are upgrading the log format from 0.9.0.1 to 0.11.0.1 and after the upgrade have set inter.broker.protocol.version=1.0 log.message.format.version=0.11.0.1 We have applied this upgrade to 5 clusters by upgrading broker 1, leaving it for a day, then coming back

Re: Pulsar and Kafka - Segment Centric vs Partition Centric

2017-12-07 Thread Brett Rann
You already have a Pulsar thread going to discuss how it compares with Kafka. Maybe you could keep these in the same thread? You seem very interested in it which is fantastic. If you do some reproducible testing comparisons I'd be interested in seeing your personal testing methodology and

Re: Kafka 0.10.0.2 reset offset in the new-consumer mode

2017-11-23 Thread Brett Rann
are live with method 2 by using group information from DescribeGroupsRequest. On Thu, Nov 23, 2017 at 9:20 PM, Ali Nazemian <alinazem...@gmail.com> wrote: > Unfortunately, it doesn't have that option in this version of Kafka! > > On Thu, Nov 23, 2017 at 9:02 PM,

Re: Kafka 0.10.0.2 reset offset in the new-consumer mode

2017-11-23 Thread Brett Rann
I don't know about kafka-storm spout, but you could try using the kafka-consumer-groups.sh cli to reset the offset. It has a --reset-offsets option. On Thu, Nov 23, 2017 at 7:02 PM, Ali Nazemian wrote: > Hi All, > > I am using Kafka 0.10.0.2 and I am not able to upgrade

java.lang.OutOfMemoryError memory leak on 1.0.0 with 0.11.0.1 on disk and converting to 0.9 clients

2017-11-07 Thread Brett Rann
et.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 replica.fetch.max.bytes=10485760 log.dirs=/data/kafka/logs num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=168 offsets.retention.minutes=10080 log.segment.bytes=1073741824log.retention.check.interval.ms=30 log.cleaner.enable=true zookeeper.connect=zoo1:2181,zoo2:2181,zoo3:2181/kafkazookeeper.connection.timeout.ms=6000 -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017

Re: Kafka 0.11 broker running out of file descriptors

2017-11-03 Thread Brett Rann
s set to 128k, the number of open file > > descriptors during normal operation is about 8k, so there is a lot of > > headroom. > > > > I'm not sure if it's the other brokers trying to replicate that kills > > it, or whether it's clients trying to publish messages. > > > > Has anyone seen a behavior like this? I'd appreciate any pointers. > > > > Thanks, > > > > Lukas > > > -- > Thanks and Regards, > Madhukar Bharti > Mob: 7845755539 > -- Brett Rann Senior DevOps Engineer Zendesk International Ltd 395 Collins Street, Melbourne VIC 3000 Australia Mobile: +61 (0) 418 826 017