Re: rebalancing latency spikes on high throughput kafka-streams services

Guozhang Wang Thu, 17 Jan 2019 10:23:36 -0800

Hello Javier,

I read you have an SO thread before I noticed the question here, so I've
answered it in SO already, just for the reference for other readers
interested in this thread:


https://stackoverflow.com/questions/54218822/kafka-streams-rebalancing-latency-spikes-on-high-throughput-kafka-streams-servic

Guozhang

On Wed, Jan 16, 2019 at 12:59 AM Javier Arias Losada <
javier.ari...@gmail.com> wrote:

> Dear all,
>
> we are starting to work with Kafka streams, our service is a very simple
> stateless consumer.
>
> We have tight requirements on latency, and we are facing too high latency
> problems when the consumer group is rebalancing. In our scenario,
> rebalancing will happen relatively often: rolling updates of code, scaling
> up/down the service, containers being shuffled by the cluster scheduler,
> containers dying, hardware failing.
>
> One of the first tests we have done is having a small consumer group with
> 4 consumers handling a small amount of messages (1K/sec) and killing one of
> them; the cluster manager (currently AWS-ECS, probably soon moving to K8S)
> starts a new one. So, more than one rebalancing is done.
>
> Our most critical metric is latency, which we measure as the milliseconds
> between message creation and message consumption. We saw the maximum
> latency spiking from a few milliseconds, to almost 15 seconds.
>
> [image: image.png]
>
> [image: image.png]
>
> [image: image.png]
>
> We also have done tests with some rolling updates of code and the results
> are worse, since our deployment is not prepared for Kafka services and we
> trigger a lot of rebalancings. We'll need to work on that, but wondering
> what are the strategies followed by other people for doing code deployment
> / autoscaling with the minimum possible delays.
>
> Not sure it might help, but our requirements are pretty relaxed related to
> message processing: we don't care about some messages being processed twice
> from time to time, or are very strict with the ordering of messages.
>
> We are using all default configurations, no tuning.
>
> We need to improve this latency spikes during rebalancing.
> Can someone, please, give us some hints on how to work on it? Is touching
> configurations enough? Do we need to use some concrete parition Asignor?
> Implement our own?
>
> What is the recommended approach to code deployment / autoscaling with the
> minimum possible delays?
>
> Our Kafka version is 1.1.0, after looking at libs found for example
> kafka/kafka_2.11-1.1.0-cp1.jar, we installed Confluent platform 4.1.0.
> In the consumer side, we are using Kafka-streams 2.1.0.
>
> Thank you for reading my question and your responses.
> Best,
> Javier Arias Losada
>


-- 
-- Guozhang

Re: rebalancing latency spikes on high throughput kafka-streams services

Reply via email to