Re: Joining the Kafka Users Mailing List

2024-03-15 Thread Bruno Cadonna

Hi Vansh,

Great that you want to join our community!

Subscription to the mailing list is self-serve. See details to subscribe 
under the following link: https://kafka.apache.org/contact


Thank you for your interest in Apache Kafka!

Best,
Bruno



On 3/15/24 1:59 PM, Vansh Kabra wrote:

Dear Kafka Users Community,


My name is Vansh Kabra, and I'm reaching out to express my interest in
joining the Kafka Users mailing list (users@kafka.apache.org).


I have been actively working with Kafka in my projects and have found it to
be an invaluable tool for building scalable and reliable real-time data
pipelines. I believe that being a part of the Kafka Users mailing list will
provide me with valuable insights, allow me to learn from the experiences
of other community members, and contribute to discussions on Kafka-related
topics.


I am eager to engage with the vibrant Kafka community, share my knowledge,
seek assistance when needed, and collaborate on solving challenges together.


Could you please add my email address (vanshkabr...@gmail.com) to the
users@kafka.apache.org mailing list?


Thank you for considering my request. I look forward to being an active
member of the Kafka Users community.


Best regards,

Vansh Kabra



Joining the Kafka Users Mailing List

2024-03-15 Thread Vansh Kabra
Dear Kafka Users Community,


My name is Vansh Kabra, and I'm reaching out to express my interest in
joining the Kafka Users mailing list (users@kafka.apache.org).


I have been actively working with Kafka in my projects and have found it to
be an invaluable tool for building scalable and reliable real-time data
pipelines. I believe that being a part of the Kafka Users mailing list will
provide me with valuable insights, allow me to learn from the experiences
of other community members, and contribute to discussions on Kafka-related
topics.


I am eager to engage with the vibrant Kafka community, share my knowledge,
seek assistance when needed, and collaborate on solving challenges together.


Could you please add my email address (vanshkabr...@gmail.com) to the
users@kafka.apache.org mailing list?


Thank you for considering my request. I look forward to being an active
member of the Kafka Users community.


Best regards,

Vansh Kabra


Re: [EXTERNAL] Re: Kafka Streams 3.5.1 based app seems to get stalled

2024-03-15 Thread Bruno Cadonna

Hi Venkatesh,

As you discovered, in Kafka Streams 3.5.1 there is no stop-the-world 
rebalancing.


Static group member is helpful when Kafka Streams clients are restarted 
as you pointed out.


> ERROR org.apache.kafka.streams.processor.internals.StandbyTask - 
stream-thread [-StreamThread-1] standby-task [1_32] Failed to 
acquire lock while closing the state store for STANDBY task


This error (and some others about lock acquisition) happens when a 
stream thread wants to lock the state directory for a task but the 
stream thread on the same Kafka Streams client has not releases the lock 
yet. And yes, Kafka Streams handles them.


30 and 60 stream threads is a lot for one Kafka Streams client. We 
recommend to use as many stream threads as cores on the compute node 
where the Kafka Streams client is run. How many Kafka Streams tasks do 
you have to distribute over the clients?


> Would you consider this level of rebalancing to be normal?

The rate of rebalance events seems high indeed. However, the log 
messages you posted in one of your last e-mails are normal during a 
rebalance and they have nothing to do with METADATA_MAX_AGE_CONFIG.


I do not know the metric SumOffsetLag. Judging from a quick search on 
the internet, I think it is a MSK specific metric.

https://repost.aws/questions/QUthnU3gycT-qj3Mtb-ekmRA/msk-metric-sumoffsetlag-how-it-works
Under the link you can also find some other metrics that you can use.

The following talk might help you debugging your rebalance issues:

https://www.confluent.io/events/kafka-summit-london-2023/kafka-streams-rebalances-and-assignments-the-whole-story/


Best,
Bruno

On 3/14/24 11:11 PM, Venkatesh Nagarajan wrote:

Just want to make a correction, Bruno - My understanding is that Kafka Streams 
3.5.1 uses Incremental Cooperative Rebalancing which seems to help reduce the 
impact of rebalancing caused by autoscaling etc.:

https://www.confluent.io/blog/incremental-cooperative-rebalancing-in-kafka/

Static group membership may also have a role to play especially if the ECS 
tasks get restarted for some reason.


I also want to mention to you about this error which occurred 759 times during 
the 13 hour load test:

ERROR org.apache.kafka.streams.processor.internals.StandbyTask - stream-thread 
[-StreamThread-1] standby-task [1_32] Failed to acquire lock while 
closing the state store for STANDBY task

I think Kafka Streams automatically recovers from this. Also, I have seen this 
error to increase when the number of streaming threads is high (30 or 60 
threads). So I use just 10 threads per ECS task.

Kind regards,
Venkatesh

From: Venkatesh Nagarajan 
Date: Friday, 15 March 2024 at 8:30 AM
To: users@kafka.apache.org 
Subject: Re: [EXTERNAL] Re: Kafka Streams 3.5.1 based app seems to get stalled
Apologies for the delay in responding to you, Bruno. Thank you very much for 
your important inputs.

Just searched for log messages in the MSK broker logs pertaining to rebalancing 
and updating of metadata for the consumer group and found 412 occurrences in a 
13 hour period. During this time, a load test was run and around 270k events 
were processed. Would you consider this level of rebalancing to be normal?

Also, I need to mention that when offset lags increase, autoscaling creates 
additional ECS tasks to help with faster processing. A lot of rebalancing 
happens for a few hours before the consumer group becomes stable.

By stop-the-world rebalancing, I meant a rebalancing that would cause the 
processing to completely stop when it happens. To avoid this, we use static 
group membership as explained by Matthias in this presentation:

https://www.confluent.io/kafka-summit-lon19/everything-you-wanted-to-know-kafka-afraid/

Static group membership seems to help reduce the impact of the rebalancing 
caused by scaling out of consumers.

On a separate note, when rebalancing happens, we lose the SumOffsetLag metric 
emitted by MSK for the consumer group. The AWS Support team said that the 
metric will only be available when the consumer group is stable or empty. I am 
not sure if this metric is specific to MSK or if it is related to Apache Kafka. 
If there is another metric I can use which can make offset lags observable even 
during rebalancing, can you please let me know?

Thank you very much.

Kind regards,
Venkatesh

From: Bruno Cadonna 
Date: Wednesday, 13 March 2024 at 8:29 PM
To: users@kafka.apache.org 
Subject: Re: [EXTERNAL] Re: Kafka Streams 3.5.1 based app seems to get stalled
Hi Venkatesh,

Extending on what Matthias replied, a metadata refresh might trigger a
rebalance if the metadata changed. However, a metadata refresh that does
not show a change in the metadata will not trigger a rebalance. In this
context, i.e., config METADATA_MAX_AGE_CONFIG, the metadata is the
metadata about the cluster received by the client.

The metadata mentioned in the log messages you posted is metadata of the
group to which the member (a.k.a. consumer, a.k.a. client)