Re: Kafka long running job consumer config best practices and what to do to avoid stuck consumer

Ali Nazemian Thu, 07 May 2020 00:38:01 -0700

To help understanding my case in more details, the error I can see
constantly is the consumer losing heartbeat and hence apparently the group
get rebalanced based on the log I can see from Kafka side:


GroupCoordinator 11]: Member
consumer-3-f46e14b4-5998-4083-b7ec-bed4e3f374eb in group foo has failed,
removing it from the group

Thanks,
Ali

On Thu, May 7, 2020 at 2:38 PM Ali Nazemian <alinazem...@gmail.com> wrote:

> Hi,
>
> With the emerge of using Apache Kafka for event-driven architecture, one
> thing that has become important is how to tune apache Kafka consumer to
> manage long-running jobs. The main issue raises when we set a relatively
> large value for "max.poll.interval.ms". Setting this value will, of
> course, resolve the issue of repetitive rebalance, but creates another
> operational issue. I am looking for some sort of golden strategy to deal
> with long-running jobs with Apache Kafka.
>
> If the consumer hangs for whatever reason, there is no easy way of passing
> that stage. It can easily block the pipeline, and you cannot do much about
> it. Therefore, it came to my mind that I am probably missing something
> here. What are the expectations? Is it not valid to use Apache Kafka for
> long-live jobs? Are there any other parameters need to be set, and the
> issue of a consumer being stuck is caused by misconfiguration?
>
> I can see there are a lot of the same issues have been raised regarding
> "the consumer is stuck" and usually, the answer has been "yeah, that's
> because you have a long-running job, etc.". I have seen different
> suggestions:
>
> - Avoid using long-running jobs. Read the message, submit it into another
> thread and let the consumer to pass. Obviously this can cause data loss and
> it would be a difficult problem to handle. It might be better to avoid
> using Kafka in the first place for these types of requests.
>
> - Avoid using apache Kafka for long-running requests
>
> - Workaround based approaches like if the consumer is blocked, try to use
> another consumer group and set the offset to the current value for the new
> consumer group, etc.
>
> There might be other suggestions I have missed here, but that is not the
> point of this email. What I am looking for is what is the best practice for
> dealing with long-running jobs with Apache Kafka. I cannot easily avoid
> using Kafka because it plays a critical part in our application and data
> pipeline. On the other side, we have had so many challenges to keep the
> long-running jobs stable operationally. So I would appreciate it if someone
> can help me to understand what approach can be taken to deal with these
> jobs with Apache Kafka as a message broker.
>
> Thanks,
> Ali
>


-- 
A.Nazemian

Re: Kafka long running job consumer config best practices and what to do to avoid stuck consumer

Reply via email to