The first thing I'd take a look at is your `max.poll.records` setting. The
default for streams is 1000 (see
https://docs.confluent.io/current/streams/developer-guide/config-streams.html#default-values).
Depending on your workloads, this could definitely cause long rebalances --
it did for me, but my workload requires some quite long processing times.

Regards,
Raman

On Wed, Jan 16, 2019 at 3:59 AM Javier Arias Losada <javier.ari...@gmail.com>
wrote:

> Dear all,
>
> we are starting to work with Kafka streams, our service is a very simple
> stateless consumer.
>
> We have tight requirements on latency, and we are facing too high latency
> problems when the consumer group is rebalancing. In our scenario,
> rebalancing will happen relatively often: rolling updates of code, scaling
> up/down the service, containers being shuffled by the cluster scheduler,
> containers dying, hardware failing.
>
> One of the first tests we have done is having a small consumer group with
> 4 consumers handling a small amount of messages (1K/sec) and killing one of
> them; the cluster manager (currently AWS-ECS, probably soon moving to K8S)
> starts a new one. So, more than one rebalancing is done.
>
> Our most critical metric is latency, which we measure as the milliseconds
> between message creation and message consumption. We saw the maximum
> latency spiking from a few milliseconds, to almost 15 seconds.
>
> [image: image.png]
>
> [image: image.png]
>
> [image: image.png]
>
> We also have done tests with some rolling updates of code and the results
> are worse, since our deployment is not prepared for Kafka services and we
> trigger a lot of rebalancings. We'll need to work on that, but wondering
> what are the strategies followed by other people for doing code deployment
> / autoscaling with the minimum possible delays.
>
> Not sure it might help, but our requirements are pretty relaxed related to
> message processing: we don't care about some messages being processed twice
> from time to time, or are very strict with the ordering of messages.
>
> We are using all default configurations, no tuning.
>
> We need to improve this latency spikes during rebalancing.
> Can someone, please, give us some hints on how to work on it? Is touching
> configurations enough? Do we need to use some concrete parition Asignor?
> Implement our own?
>
> What is the recommended approach to code deployment / autoscaling with the
> minimum possible delays?
>
> Our Kafka version is 1.1.0, after looking at libs found for example
> kafka/kafka_2.11-1.1.0-cp1.jar, we installed Confluent platform 4.1.0.
> In the consumer side, we are using Kafka-streams 2.1.0.
>
> Thank you for reading my question and your responses.
> Best,
> Javier Arias Losada
>

Reply via email to