LInkedin's cruise-control https://github.com/linkedin/cruise-control
has numerous goals, including disk, network, cpu, rack awareness,
leadership distribution etc.
You can have separate disk/network limits per broker (ours are all the same
fwiw)
We use it and it does a stellar job of keeping a
I'd
* verify you shutdown cleanly when restarting the broker
* test restarting the broker first before upgrading it (it may have been
sitting on a corrupt file for a long time and the issue is unrelated to the
upgrade)
re your questions:
1) probably not, unless it was caused by an unclean
Looks fine to me:
from:
https://www.apache.org/dyn/closer.cgi?path=/kafka/2.0.0/kafka_2.11-2.0.0.tgz
tmp/kafka_2.11-2.0.0/bin% head -2 kafka-topics.sh
#!/bin/bash
# Licensed to the Apache Software Foundation (ASF) under one or more
Also from this src it's fine in there too:
contain information that is privileged, confidential and/or
> proprietary
> > and subject to important terms and conditions available at
> > http://www.bankofamerica.com/emaildisclaimer
> <http://www.bankofamerica.com/emaildisclaimer>. If you are
> not the
> > in
That's unusually large. Ours are around 32k-90mb each. Initially curious if
you
have log.cleaner.enable=true and what offsets.retention.minutes is set to.
And yes it can affect cluster performance. We had instances of consumer
outages
that were caused by bugged large consumer offfset files,
join.
>
> Maybe this was a case in which the partition reassignment CLI tool would
> have been useful?
>
> On Thu, Sep 6, 2018 at 11:09 PM Brett Rann
> wrote:
>
> > We have partitions that are in the 100s of GBs.
> >
> > It shouldn't have to shuffle around GB chunks of d
You need to do a partition reassignment to increase or decrease the
replication factor. It's tediously manual, but it's just json so it's
trivial to manipulate which is probably why it's still tediously manual.
There's a guide here although it's ageing a little:
We have partitions that are in the 100s of GBs.
It shouldn't have to shuffle around GB chunks of data unless you have done
partition moves, or had a broker offline for a while. Is that the case?
If not your ISR problem is probably related to something else other than
retention size.
On Fri, Sep
These are in regex form for DataDog's JMX collector, but it should get you
started:
bean_regex:
'kafka\.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=.*'
bean_regex:
'kafka\.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec,topic=.*'
e.org/job/kafka-1.1-jdk7/162
> <https://builds.apache.org/job/kafka-1.1-jdk7/162>
> <https://builds.apache.org/job/kafka-1.1-jdk7/162
> <https://builds.apache.org/job/kafka-1.1-jdk7/162>>*
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/1.
> >
> > http://kafka.apache.org/20/documentation.html
> <http://kafka.apache.org/20/documentation.html>
> >
> >
> >
> > * Protocol:
> >
> > http://kafka.apache.org/20/protocol.html
> <http://kafka.apache.org/20/protocol.htm
They will be members of the cluster, yes, but they won't be serving any
partitions unless you create new topics where partitions might be assigned
to the new brokers, or run things which automatically balance partitions
(kafka-monitor does that, and cruise-control can).
You'll need to run
tween them
> 2. Each broker with 4GB of RAM and a disk that writes at 35MB/s
> 3. 3 partitions, each with a replication factor of 3
> 4. 20M messages (80GB) per partition
>
> A rolling upgrade takes about 8 hours. I suspect this is not the norm.
>
> Kostas
>
>
--
Brett Rann
Senior DevOps Engineer
Zendesk International Ltd
e only choice is to
>>> restart or even stop the full server, but due to operational procedures ,
>>> that may take some time.
>>>
>>>
>>> Therefore, is there any configuration that could be applied for such
>>> broker
>>> to be "
i) no.
ii) yes you do, no it won't.
:)
You used the word replace, not add. Is the final state of your cluster 6
nodes, or 12?
If you're replacing, you might want to consider just replacing one at a
time to avoid having to do reassignments:
a) stop 1 broker, lets say broker "1".
b) start up the
its logs. And if you were following the rolling upgrade
method correctly it was very likely part way through it?
On Mon, Apr 23, 2018 at 5:42 PM, Mika Linnanoja <mika.linnan...@rovio.com>
wrote:
> Hi,
>
> On Mon, Apr 23, 2018 at 10:25 AM, Brett Rann <br...@zendesk.com.inva
partitions are never automatically moved. They are assigned to broker(s)
and stay that way unless reassignments are triggered by external tools.
(leadership can move automatically though, if RF>1).
There's more info on partitions and moving partitions at these two links:
s, mostly by secor.
>
> BR,
> Mika
>
> --
> *Mika Linnanoja*
> Senior Cloud Engineer
> Games Technology
> Rovio Entertainment Corp
> Keilaranta 7
> <https://maps.google.com/?q=Keilaranta+7=gmail=g>, FIN -
> 02150 Espoo, Finland
> mika.linnan...@rovio.com
> www.ro
Release artifacts to be voted upon (source and binary):
> > > > >
> > > > > http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/
> <http://home.apache.org/~rsivaram/kafka-1.1.0-rc4/>
> > > > >
> > > > >
> > > > > *
gt; <https://kafka.apache.org/10/javadoc/index.html?org/apache/
> kafka/clients/admin/AdminClient.html
> <https://kafka.apache.org/10/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html>>.
>
>
> is this feature available?
>
> thanks,
> Jason
>
--
Brett Rann
Senior DevOps Engineer
Zendesk International Ltd
395 Collins Street, Melbourne VIC 3000 Australia
Mobile: +61 (0) 418 826 017
the command line tools should be using config/tools-log4j.properties which
should have log level set to WARN, not INFO
config/log4j.properties is used by the broker and by default is set to INFO
bin/kafka-run-class.sh will default to the first one if KAFKA_LOG4J_OPTS
isn't set.
# Log4j settings
Congratulations Rajini!
On Thu, Jan 18, 2018 at 9:23 AM, Konstantine Karantasis <
konstant...@confluent.io> wrote:
> Congrats Rajini!
>
> -Konstantine
>
> On Wed, Jan 17, 2018 at 2:18 PM, Becket Qin wrote:
>
> > Congratulations, Rajini!
> >
> > On Wed, Jan 17, 2018 at 1:52
if RF=2 and min.insync.replicas=1 (the default) then you shouldn't have
offline partitions if 1 of 3 brokers is down.
I'd first double check your topic config (and broker defaults) for the one
that went offline to verify RF/Min. Be sure to check each partition they
can be different! (
There are several bugs in 0.9 around consumer offsets and compaction and
log cleaning.
The easiest path forward is to upgrade to the latest 0.11.x. We ended up
going to somewhat extreme lengths to deal with 100GB+ consumer offsets.
When we tested an upgrade we noticed that when it started
we run a 1.0.1 prerelease in production just fine, but the scale is
smaller. 20+ clusters with 3-10 brokers each, each cluster with about 120
topics and about 15k partitions. We have unusual messages sizes, so peaks
of around 40k messages, 60MB in, 400MB out, per sec in the largest one.
we run a
What do the broker logs say its doing during all that time?
There are some consumer offset / log cleaner bugs which caused us similarly
log delays. that was easily visible by watching the log cleaner activity in
the logs, and in our monitoring of partition sizes watching them go down,
along with
Is there a plan/eta for this?
> On 19 Dec 2017, at 08:49, Ismael Juma wrote:
>
> Hi Maciek,
>
> I expect that 1.0.1 will be released some time in January.
>
> Ismael
>
> On Mon, Dec 18, 2017 at 10:42 AM, Maciek Próchniak wrote:
>
> > Hello,
> >
> > are
The old segment isn't rolled, a new one is. eg, take this partition
directory:
drwxr-xr-x2 kafka kafka 4096 Jan 1 10:31 .
drwxr-xr-x 1483 kafka kafka94208 Jan 3 02:11 ..
-rw-r--r--1 kafka kafka15504 Jan 1 10:31 04050355.index
-rw-r--r--1 kafka kafka 8210576
nd.scala:188)
> at
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(
> ReassignPartitionsCommand.scala:158)
> at
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(
> ReassignPartitionsCommand.scala:154)
> at
> kafka.admin.ReassignPartitionsCommand$.main(ReassignPar
}
]
}
]
}
Anyway, easily fixed with restarts of Burrow, once it was clear which
consumer was having the issue.
On Fri, Dec 15, 2017 at 5:04 PM, Brett Rann <br...@zendesk.com> wrote:
> Another interesting datapoint:
>
> Taking a deeper look at partition 21:
>
&g
You would add new brokers to the cluster, and then do a partition
reassignment to move some partitions to the new broker.
In the simplest example:
Say you have 1 topic with 3 partitions.
partition 0: brokers: 1,2
partition 1: brokers: 2,3
partition 2: brokers: 3,1
If you added 3 more brokers,
3 log files. it's not there.)
Why is it trying 201167266? Is it from the snapshot files? Is there some
surgery we can do to make it stop. Safely? :)
On Fri, Dec 15, 2017 at 4:33 PM, Brett Rann <br...@zendesk.com> wrote:
> on `kafka_2.11-1.0.1-d04daf570` we are upgrading the log fo
on `kafka_2.11-1.0.1-d04daf570` we are upgrading the log format from
0.9.0.1 to 0.11.0.1 and after the upgrade have set
inter.broker.protocol.version=1.0
log.message.format.version=0.11.0.1
We have applied this upgrade to 5 clusters by upgrading broker 1, leaving
it for a day, then coming back
You already have a Pulsar thread going to discuss how it compares with
Kafka. Maybe you could keep these in the same thread? You seem very
interested in it which is fantastic. If you do some reproducible testing
comparisons I'd be interested in seeing your personal testing methodology
and
are
live with method 2 by using group information from DescribeGroupsRequest.
On Thu, Nov 23, 2017 at 9:20 PM, Ali Nazemian <alinazem...@gmail.com> wrote:
> Unfortunately, it doesn't have that option in this version of Kafka!
>
> On Thu, Nov 23, 2017 at 9:02 PM,
I don't know about kafka-storm spout, but you could try using
the kafka-consumer-groups.sh cli to reset the offset. It has
a --reset-offsets option.
On Thu, Nov 23, 2017 at 7:02 PM, Ali Nazemian wrote:
> Hi All,
>
> I am using Kafka 0.10.0.2 and I am not able to upgrade
et.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
replica.fetch.max.bytes=10485760
log.dirs=/data/kafka/logs
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
offsets.retention.minutes=10080
log.segment.bytes=1073741824log.retention.check.interval.ms=30
log.cleaner.enable=true
zookeeper.connect=zoo1:2181,zoo2:2181,zoo3:2181/kafkazookeeper.connection.timeout.ms=6000
--
Brett Rann
Senior DevOps Engineer
Zendesk International Ltd
395 Collins Street, Melbourne VIC 3000 Australia
Mobile: +61 (0) 418 826 017
s set to 128k, the number of open file
> > descriptors during normal operation is about 8k, so there is a lot of
> > headroom.
> >
> > I'm not sure if it's the other brokers trying to replicate that kills
> > it, or whether it's clients trying to publish messages.
> >
> > Has anyone seen a behavior like this? I'd appreciate any pointers.
> >
> > Thanks,
> >
> > Lukas
> >
> --
> Thanks and Regards,
> Madhukar Bharti
> Mob: 7845755539
>
--
Brett Rann
Senior DevOps Engineer
Zendesk International Ltd
395 Collins Street, Melbourne VIC 3000 Australia
Mobile: +61 (0) 418 826 017
38 matches
Mail list logo