Hm, it's an optimization for "first layer", so if the bottleneck is in
"second layer" (i.e. DB write) as you mentioned, it shouldn't make much
difference I think.
2020年12月22日(火) 16:02 Yana K :
> I thought about it but then we don't have much time - will it optimize
> performance?
>
> On Mon, Dec
I thought about it but then we don't have much time - will it optimize
performance?
On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada wrote:
> About "first layer" right?
> Then it's better to make sure that not get() the result of Producer#send()
> for each message, because in that way, it spoils the
About "first layer" right?
Then it's better to make sure that not get() the result of Producer#send()
for each message, because in that way, it spoils the ability of
producer-batching.
Kafka producer batches messages by default and it's very efficient, so if
you produce in async way, it rarely beco
Thanks!
Also are there any producer optimizations anyone can think of in this
scenario?
On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters
wrote:
> I'd probably just do it by experiment for your concrete data.
>
> Maybe generate a few million synthetic data rows, and for-each-batch insert
> them i
I'd probably just do it by experiment for your concrete data.
Maybe generate a few million synthetic data rows, and for-each-batch insert
them into a dev DB, with an outer grid search over various candidate batch
sizes. You're looking to optimise for flat-out rows/s, so whichever batch
size wins (
Thanks Haruki and Joris.
Haruki:
Thanks for the detailed calculations. Really appreciate it. What tool/lib
is used to load test kafka?
So we've one consumer group and running 7 instances of the application -
that should be good enough - correct?
Joris:
Great point.
DB insert is a bottleneck (and
Do you know why your consumers are so slow? 12E6msg/hour is msg/s,
which is not very high from a Kafka point-of-view. As you're doing database
inserts, I suspect that is where the bottleneck lies.
If, for example, you're doing a single-row insert in a SQL DB for every
message then this would i
About load test:
I think it'd be better to monitor per-message process latency and estimate
required partition count based on it because it determines the max
throughput per single partition.
- Say you have to process 12 million messages/hour = messages/sec .
- If you have 7 partitions (thus 7
So as the next step I see to increase the partition of the 2nd topic - do I
increase the instances of the consumer from that or keep it at 7?
Anything else (besides researching those libs)?
Are there any good tools for load testing kafka?
On Sun, Dec 20, 2020 at 7:23 PM Haruki Okada wrote:
> It
It depends on how you manually commit offsets.
Auto-commit does commits offsets in async manner basically, so as long as
you do manual-commit in the same way, there should be no much difference.
And, generally offset-commit mode doesn't make much difference in
performance regardless manual/auto o
Thank you so much Marina and Haruka.
Marina's response:
- When you say " if you are sure there is no room for perf optimization of
the processing itself :" - do you mean code level optimizations? Can you
please explain?
- On the second topic you say " I'd say at least 40" - is this based on 12
mil
Hi.
Yeah, Spring-Kafka does processing messages sequentially, so the consumer
throughput would be capped by database latency per single process.
One possible solution is creating an intermediate topic (or altering source
topic) with much more partitions as Marina suggested.
I'd like to suggest an
The way I see it - you can only do a few things - if you are sure there is no
room for perf optimization of the processing itself :
1. speed up your processing per consumer thread: which you already tried by
splitting your logic into a 2-step pipeline instead of 1-step, and delegating
the work o
Hi
I am new to the Kafka world and running into this scale problem. I thought
of reaching out to the community if someone can help.
So the problem is I am trying to consume from a Kafka topic that can have a
peak of 12 million messages/hour. That topic is not under my control - it
has 7 partitions
Hi Ramz,
A good rule of thumb has been no more than 4,000 partitions per broker and no
more than 100,000 in a cluster.
This includes all replicas and it's related more to Kafka internals then it is
resource usage so I strongly advise not pushing these limits.
Otherwise, the usual reasons for sc
Hi users,
On what basis should we scale kafka cluster what would be symptoms for
scaling kafka.
I have a 3 node kafka cluster upto how many max partitions a single broker
or kafka cluster can support?
If any article or knowledge share would be help on scaling kafka.
Thanks,
Ramz.
---
From: Hafsa Asif [mailto:hafsa.a...@matchinguu.com]
Sent: Wednesday, June 01, 2016 7:05 AM
To: users@kafka.apache.org
Cc: Spico Florin
Subject: Re: Rebalancing issue while Kafka scaling
Just for more info:
If I have 10 servers in a cluster, so for the most tolerant cluster, do we need
replication-
eplicas, including the server
> >> being
> >>>> removed and you intend to rebalance after server removal).
> >>>>
> >>>> However, "automating" the rebalancing of topic partitions is not
> >> trivial.
> >>>>
> >>&g
ions is not
>> trivial.
>>>>
>>>> There is a KIP out there to help with the rebalancing , but lacks
>> details
>>>> -
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing
&g
pache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+rebalancing
> >> My guess is due to its non-trivial nature AND the number of cases one
> >> needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in
> say,
> >> a 20 node clus
the number of cases one
>> needs to take care of - e.g. scaling up by 5% v/s scaling up by 50% in say,
>> a 20 node cluster.
>> Furthermore, to be really effective, one needs to be cognizant of the
>> partition sizes, and with rack-awareness, the task becomes even more
>
> Furthermore, to be really effective, one needs to be cognizant of the
> partition sizes, and with rack-awareness, the task becomes even more
> involved.
>
> Regards,
> Jayesh
>
> -Original Message-
> From: Spico Florin [mailto:spicoflo...@gmail.com]
> Sent
-
From: Spico Florin [mailto:spicoflo...@gmail.com]
Sent: Tuesday, May 31, 2016 9:44 AM
To: users@kafka.apache.org
Subject: Re: Rebalancing issue while Kafka scaling
Hi!
What version of Kafka you are using? What do you mean by "Kafka needs
rebalacing?" Rebalancing of what? Can you ple
Hi!
What version of Kafka you are using? What do you mean by "Kafka needs
rebalacing?" Rebalancing of what? Can you please be more specific.
Regards,
Florin
On Tue, May 31, 2016 at 4:58 PM, Hafsa Asif
wrote:
> Hello Folks,
>
> Today , my team members shows concern that whenever we increase
Hello Folks,
Today , my team members shows concern that whenever we increase node in
Kafka cluster, Kafka needs rebalancing. The rebalancing is sort of manual
and not-good step whenever scaling happens. Second, if Kafka scales up then
it cannot be scale down. Please provide us proper guidance over
25 matches
Mail list logo