Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread David Jacot
Congrats!

Le jeu. 28 déc. 2023 à 05:13, Ismael Juma  a écrit :

> Congratulations Divij!
>
> Ismael
>
> On Wed, Dec 27, 2023 at 3:46 AM Luke Chen  wrote:
>
> > Hi, Everyone,
> >
> > Divij has been a Kafka committer since June, 2023. He has remained very
> > active and instructive in the community since becoming a committer. It's
> my
> > pleasure to announce that Divij is now a member of Kafka PMC.
> >
> > Congratulations Divij!
> >
> > Luke
> > on behalf of Apache Kafka PMC
> >
>


[jira] [Created] (KAFKA-16056) Worker poll timeout expiry can lead to Duplicate task assignments.

2023-12-27 Thread Sagar Rao (Jira)
Sagar Rao created KAFKA-16056:
-

 Summary: Worker poll timeout expiry can lead to Duplicate task 
assignments.
 Key: KAFKA-16056
 URL: https://issues.apache.org/jira/browse/KAFKA-16056
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Sagar Rao
Assignee: Sagar Rao


When a poll timeout expiry happens for a worker, it triggers a rebalance 
because it leaves the group pro-actively. Under normal scenarios, this leaving 
the group would trigger a scheduled rebalance delay. However, one thing to note 
is that, the worker which left the group temporarily, doesn't give up it's 
assignments and whatever tasks were running on it would remain as is. When the 
scheduled rebalance delay elapses, it would just get back it's assignments but 
given that there won't be any revocations, it should all work out fine.

But there is an edge case here. Let's assume that a scheduled rebalance delay 
was already active on a group and just before a follow up rebalance due to 
scheduled rebalance elapsing, one of the worker's poll timeout expires. At this 
point, a rebalance is imminent and the leader would track the assignments of 
the transiently departed worker as lost 
[here|https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java#L255]
 . When 
[handleLostAssignments|https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java#L441]
 gets triggered, because the scheduledRebalance delay isn't reset yet and if 
[this|https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java#L473]
 condition passes, the leader would assume that it needs to reassign all the 
lost assignments which it will.

But because, the worker for which the poll timeout expired, doesn't rescind 
it's assignments we would end up noticing duplicate assignments- one set on the 
original worker which was already running the tasks and connectors and another 
set on the remaining group of workers which got the redistributed work. This 
could lead to task failures if connector has been written in a way which 
expects no duplicate tasks running across a cluster.

Also, this edge case can be encountered more frequently if the 
`rebalance.timeout.ms` config is set to a lower value. 

One of the approaches could be to do something similar to 
https://issues.apache.org/jira/browse/KAFKA-9184 where upon coordinator 
discovery failure, the worker gives up all it's assignments and joins with an 
empty assignment. We could do something similar in this case as well.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] MINOR: Add docker in list of templates that get rendered [kafka-site]

2023-12-27 Thread via GitHub


VedarthConfluent opened a new pull request, #574:
URL: https://github.com/apache/kafka-site/pull/574

   Adds support for docker in documentation from 3.7.0 onwards


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Ismael Juma
Congratulations Divij!

Ismael

On Wed, Dec 27, 2023 at 3:46 AM Luke Chen  wrote:

> Hi, Everyone,
>
> Divij has been a Kafka committer since June, 2023. He has remained very
> active and instructive in the community since becoming a committer. It's my
> pleasure to announce that Divij is now a member of Kafka PMC.
>
> Congratulations Divij!
>
> Luke
> on behalf of Apache Kafka PMC
>


[jira] [Resolved] (KAFKA-15948) Refactor AsyncKafkaConsumer shutdown

2023-12-27 Thread Philip Nee (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Nee resolved KAFKA-15948.

  Assignee: Philip Nee  (was: Phuc Hong Tran)
Resolution: Fixed

> Refactor AsyncKafkaConsumer shutdown
> 
>
> Key: KAFKA-15948
> URL: https://issues.apache.org/jira/browse/KAFKA-15948
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Philip Nee
>Priority: Major
>  Labels: consumer-threading-refactor
> Fix For: 3.8.0
>
>
> Upon closing we need a round trip from the network thread to the application 
> thread and then back to the network thread to complete the callback 
> invocation.  Currently, we don't have any of that.  I think we need to 
> refactor our closing mechanism.  There are a few points to the refactor:
>  # The network thread should know if there's a custom user callback to 
> trigger or not.  If there is, it should wait for the callback completion to 
> send a leave group.  If not, it should proceed with the shutdown.
>  # The application thread sends a closing signal to the network thread and 
> continuously polls the background event handler until time runs out.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread ziming deng
Congrats Divij!


> On Dec 28, 2023, at 10:59, Satish Duggana  wrote:
> 
> Congratulations Divij!
> 
> On Thu, 28 Dec 2023 at 07:21, Kamal Chandraprakash
>  wrote:
>> 
>> Congrats Divij!
>> 
>> On Thu, Dec 28, 2023, 07:09 Kirk True  wrote:
>> 
>>> Congrats Divij!!!
>>> 
>>> On Wed, Dec 27, 2023, at 1:44 PM, Jorge Esteban Quilcate Otoya wrote:
 Congratulations Divij!!
 
 On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:
 
> Congratulations!
> 
> On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
> 
>> congrats divij!
>> 
>> On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
>> 
>> wrote:
>> 
>>> Congratulations Divij!
>>> 
>>> On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> wrote:
>>> 
 Congratulations Divij!
 
 Regards,
 Gaurav
 
> On 27-Dec-2023, at 17:44, Mickael Maison <
>>> mickael.mai...@gmail.com
>> 
 wrote:
> 
> Congratulations Divij!
> 
>> On Wed, Dec 27, 2023 at 1:05 PM Sagar <
>>> sagarmeansoc...@gmail.com>
 wrote:
>> 
>> Congrats Divij! Absolutely well deserved !
>> 
>> Thanks!
>> Sagar.
>> 
>>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen >>> 
>> wrote:
>>> 
>>> Hi, Everyone,
>>> 
>>> Divij has been a Kafka committer since June, 2023. He has
> remained
>>> very
>>> active and instructive in the community since becoming a
> committer.
 It's my
>>> pleasure to announce that Divij is now a member of Kafka PMC.
>>> 
>>> Congratulations Divij!
>>> 
>>> Luke
>>> on behalf of Apache Kafka PMC
>>> 
 
>>> 
>> 
> 
 
>>> 



Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Satish Duggana
Congratulations Divij!

On Thu, 28 Dec 2023 at 07:21, Kamal Chandraprakash
 wrote:
>
> Congrats Divij!
>
> On Thu, Dec 28, 2023, 07:09 Kirk True  wrote:
>
> > Congrats Divij!!!
> >
> > On Wed, Dec 27, 2023, at 1:44 PM, Jorge Esteban Quilcate Otoya wrote:
> > > Congratulations Divij!!
> > >
> > > On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:
> > >
> > > > Congratulations!
> > > >
> > > > On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
> > > >
> > > > > congrats divij!
> > > > >
> > > > > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Congratulations Divij!
> > > > > >
> > > > > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> > > > wrote:
> > > > > >
> > > > > > > Congratulations Divij!
> > > > > > >
> > > > > > > Regards,
> > > > > > > Gaurav
> > > > > > >
> > > > > > > > On 27-Dec-2023, at 17:44, Mickael Maison <
> > mickael.mai...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Congratulations Divij!
> > > > > > > >
> > > > > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar <
> > sagarmeansoc...@gmail.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Congrats Divij! Absolutely well deserved !
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >> Sagar.
> > > > > > > >>
> > > > > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  > >
> > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> Hi, Everyone,
> > > > > > > >>>
> > > > > > > >>> Divij has been a Kafka committer since June, 2023. He has
> > > > remained
> > > > > > very
> > > > > > > >>> active and instructive in the community since becoming a
> > > > committer.
> > > > > > > It's my
> > > > > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > > > > >>>
> > > > > > > >>> Congratulations Divij!
> > > > > > > >>>
> > > > > > > >>> Luke
> > > > > > > >>> on behalf of Apache Kafka PMC
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Kamal Chandraprakash
Congrats Divij!

On Thu, Dec 28, 2023, 07:09 Kirk True  wrote:

> Congrats Divij!!!
>
> On Wed, Dec 27, 2023, at 1:44 PM, Jorge Esteban Quilcate Otoya wrote:
> > Congratulations Divij!!
> >
> > On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:
> >
> > > Congratulations!
> > >
> > > On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
> > >
> > > > congrats divij!
> > > >
> > > > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Congratulations Divij!
> > > > >
> > > > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> > > wrote:
> > > > >
> > > > > > Congratulations Divij!
> > > > > >
> > > > > > Regards,
> > > > > > Gaurav
> > > > > >
> > > > > > > On 27-Dec-2023, at 17:44, Mickael Maison <
> mickael.mai...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > Congratulations Divij!
> > > > > > >
> > > > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar <
> sagarmeansoc...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Congrats Divij! Absolutely well deserved !
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >> Sagar.
> > > > > > >>
> > > > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  >
> > > > wrote:
> > > > > > >>>
> > > > > > >>> Hi, Everyone,
> > > > > > >>>
> > > > > > >>> Divij has been a Kafka committer since June, 2023. He has
> > > remained
> > > > > very
> > > > > > >>> active and instructive in the community since becoming a
> > > committer.
> > > > > > It's my
> > > > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > > > >>>
> > > > > > >>> Congratulations Divij!
> > > > > > >>>
> > > > > > >>> Luke
> > > > > > >>> on behalf of Apache Kafka PMC
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Kirk True
Congrats Divij!!!

On Wed, Dec 27, 2023, at 1:44 PM, Jorge Esteban Quilcate Otoya wrote:
> Congratulations Divij!!
> 
> On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:
> 
> > Congratulations!
> >
> > On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
> >
> > > congrats divij!
> > >
> > > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Congratulations Divij!
> > > >
> > > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> > wrote:
> > > >
> > > > > Congratulations Divij!
> > > > >
> > > > > Regards,
> > > > > Gaurav
> > > > >
> > > > > > On 27-Dec-2023, at 17:44, Mickael Maison  > >
> > > > > wrote:
> > > > > >
> > > > > > Congratulations Divij!
> > > > > >
> > > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > > > > wrote:
> > > > > >>
> > > > > >> Congrats Divij! Absolutely well deserved !
> > > > > >>
> > > > > >> Thanks!
> > > > > >> Sagar.
> > > > > >>
> > > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen 
> > > wrote:
> > > > > >>>
> > > > > >>> Hi, Everyone,
> > > > > >>>
> > > > > >>> Divij has been a Kafka committer since June, 2023. He has
> > remained
> > > > very
> > > > > >>> active and instructive in the community since becoming a
> > committer.
> > > > > It's my
> > > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > > >>>
> > > > > >>> Congratulations Divij!
> > > > > >>>
> > > > > >>> Luke
> > > > > >>> on behalf of Apache Kafka PMC
> > > > > >>>
> > > > >
> > > >
> > >
> >
> 


[jira] [Created] (KAFKA-16055) Thread unsafe use of HashMap stored in QueryableStoreProvider#storeProviders

2023-12-27 Thread Kohei Nozaki (Jira)
Kohei Nozaki created KAFKA-16055:


 Summary: Thread unsafe use of HashMap stored in 
QueryableStoreProvider#storeProviders
 Key: KAFKA-16055
 URL: https://issues.apache.org/jira/browse/KAFKA-16055
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.1
Reporter: Kohei Nozaki


This was originally raised in [a kafka-users 
post|https://lists.apache.org/thread/gpct1275bfqovlckptn3lvf683qpoxol].

There is a HashMap stored in QueryableStoreProvider#storeProviders ([code 
link|https://github.com/apache/kafka/blob/3.6.1/streams/src/main/java/org/apache/kafka/streams/state/internals/QueryableStoreProvider.java#L39])
 which can be mutated by a KafkaStreams#removeStreamThread() call. This can be 
problematic when KafkaStreams#store is called from a separate thread.

We need to somehow make this part of code thread-safe by replacing it by 
ConcurrentHashMap or/and using an existing locking mechanism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Jorge Esteban Quilcate Otoya
Congratulations Divij!!

On Wed 27. Dec 2023 at 14.56, Tom Bentley  wrote:

> Congratulations!
>
> On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:
>
> > congrats divij!
> >
> > On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> > 
> > wrote:
> >
> > > Congratulations Divij!
> > >
> > > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula 
> wrote:
> > >
> > > > Congratulations Divij!
> > > >
> > > > Regards,
> > > > Gaurav
> > > >
> > > > > On 27-Dec-2023, at 17:44, Mickael Maison  >
> > > > wrote:
> > > > >
> > > > > Congratulations Divij!
> > > > >
> > > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > > > wrote:
> > > > >>
> > > > >> Congrats Divij! Absolutely well deserved !
> > > > >>
> > > > >> Thanks!
> > > > >> Sagar.
> > > > >>
> > > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen 
> > wrote:
> > > > >>>
> > > > >>> Hi, Everyone,
> > > > >>>
> > > > >>> Divij has been a Kafka committer since June, 2023. He has
> remained
> > > very
> > > > >>> active and instructive in the community since becoming a
> committer.
> > > > It's my
> > > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > > >>>
> > > > >>> Congratulations Divij!
> > > > >>>
> > > > >>> Luke
> > > > >>> on behalf of Apache Kafka PMC
> > > > >>>
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Tom Bentley
Congratulations!

On Thu, 28 Dec 2023 at 06:17, Philip Nee  wrote:

> congrats divij!
>
> On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan
> 
> wrote:
>
> > Congratulations Divij!
> >
> > On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula  wrote:
> >
> > > Congratulations Divij!
> > >
> > > Regards,
> > > Gaurav
> > >
> > > > On 27-Dec-2023, at 17:44, Mickael Maison 
> > > wrote:
> > > >
> > > > Congratulations Divij!
> > > >
> > > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > > wrote:
> > > >>
> > > >> Congrats Divij! Absolutely well deserved !
> > > >>
> > > >> Thanks!
> > > >> Sagar.
> > > >>
> > > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen 
> wrote:
> > > >>>
> > > >>> Hi, Everyone,
> > > >>>
> > > >>> Divij has been a Kafka committer since June, 2023. He has remained
> > very
> > > >>> active and instructive in the community since becoming a committer.
> > > It's my
> > > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > > >>>
> > > >>> Congratulations Divij!
> > > >>>
> > > >>> Luke
> > > >>> on behalf of Apache Kafka PMC
> > > >>>
> > >
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Philip Nee
congrats divij!

On Wed, Dec 27, 2023 at 8:55 AM Justine Olshan 
wrote:

> Congratulations Divij!
>
> On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula  wrote:
>
> > Congratulations Divij!
> >
> > Regards,
> > Gaurav
> >
> > > On 27-Dec-2023, at 17:44, Mickael Maison 
> > wrote:
> > >
> > > Congratulations Divij!
> > >
> > >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> > wrote:
> > >>
> > >> Congrats Divij! Absolutely well deserved !
> > >>
> > >> Thanks!
> > >> Sagar.
> > >>
> > >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:
> > >>>
> > >>> Hi, Everyone,
> > >>>
> > >>> Divij has been a Kafka committer since June, 2023. He has remained
> very
> > >>> active and instructive in the community since becoming a committer.
> > It's my
> > >>> pleasure to announce that Divij is now a member of Kafka PMC.
> > >>>
> > >>> Congratulations Divij!
> > >>>
> > >>> Luke
> > >>> on behalf of Apache Kafka PMC
> > >>>
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Justine Olshan
Congratulations Divij!

On Wed, Dec 27, 2023 at 4:20 AM Gaurav Narula  wrote:

> Congratulations Divij!
>
> Regards,
> Gaurav
>
> > On 27-Dec-2023, at 17:44, Mickael Maison 
> wrote:
> >
> > Congratulations Divij!
> >
> >> On Wed, Dec 27, 2023 at 1:05 PM Sagar 
> wrote:
> >>
> >> Congrats Divij! Absolutely well deserved !
> >>
> >> Thanks!
> >> Sagar.
> >>
> >>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:
> >>>
> >>> Hi, Everyone,
> >>>
> >>> Divij has been a Kafka committer since June, 2023. He has remained very
> >>> active and instructive in the community since becoming a committer.
> It's my
> >>> pleasure to announce that Divij is now a member of Kafka PMC.
> >>>
> >>> Congratulations Divij!
> >>>
> >>> Luke
> >>> on behalf of Apache Kafka PMC
> >>>
>


[jira] [Resolved] (KAFKA-15904) Downgrade tests are failing with directory.id 

2023-12-27 Thread Proven Provenzano (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Proven Provenzano resolved KAFKA-15904.
---
Resolution: Fixed

This was merged into trunk a month ago, long before the 3.7 branch was cut. I 
just forgot to close the ticket.

> Downgrade tests are failing with directory.id 
> --
>
> Key: KAFKA-15904
> URL: https://issues.apache.org/jira/browse/KAFKA-15904
> Project: Kafka
>  Issue Type: Bug
>Reporter: Manikumar
>Assignee: Proven Provenzano
>Priority: Major
> Fix For: 3.7.0
>
>
> {{kafkatest.tests.core.downgrade_test.TestDowngrade}} tests are failing after 
> [https://github.com/apache/kafka/pull/14628.] 
> We have added {{directory.id}} to metadata.properties. This means 
> {{metadata.properties}} will be different for different log directories.
> Cluster downgrades will fail with below error if we have multiple log 
> directories . This looks blocker or requires additional downgrade steps from 
> AK 3.7. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16054) Sudden 100% CPU on a broker

2023-12-27 Thread Oleksandr Shulgin (Jira)
Oleksandr Shulgin created KAFKA-16054:
-

 Summary: Sudden 100% CPU on a broker
 Key: KAFKA-16054
 URL: https://issues.apache.org/jira/browse/KAFKA-16054
 Project: Kafka
  Issue Type: Bug
  Components: network
Affects Versions: 3.6.1, 3.3.2
 Environment: Amazon AWS, c6g.4xlarge arm64 16 vCPUs + 30 GB,  Amazon 
Linux
Reporter: Oleksandr Shulgin


We have observed now for the 3rd time in production the issue where a Kafka 
broker will suddenly jump to 100% CPU usage and will not recover on its own 
until manually restarted.

After a deeper investigation, we now believe that this is an instance of the 
infamous epoll bug. See:
[https://github.com/netty/netty/issues/327]
[https://github.com/netty/netty/pull/565] (original workaround)
[https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/nio/NioEventLoop.java#L624-L632]
 (same workaround in the current Netty code)

The first occurrence in our production environment was on 2023-08-26 and the 
other two — on 2023-12-10 and 2023-12-20.

Each time the high CPU issue is also resulting in this other issue (misplaced 
messages) I was asking about on the users mailing list in September, but to 
date got not a single reply, unfortunately: 
[https://lists.apache.org/thread/x1thr4r0vbzjzq5sokqgrxqpsbnnd3yy]

We still do not know how this other issue is happening.

When the high CPU happens, "top {-}H" reports a number of 
"data-plane-kafka{-}..." threads consuming ~60% user and ~40% system CPU, and 
the thread dump contains a lot of stack traces like the following one:

"data-plane-kafka-network-thread-67111914-ListenerName(PLAINTEXT)-PLAINTEXT-10" 
#76 prio=5 os_prio=0 cpu=346710.78ms elapsed=243315.54s tid=0xa12d7690 
nid=0x20c runnable [0xfffed87fe000]
java.lang.Thread.State: RUNNABLE
#011at sun.nio.ch.EPoll.wait(java.base@17.0.9/Native Method)
#011at 
sun.nio.ch.EPollSelectorImpl.doSelect(java.base@17.0.9/EPollSelectorImpl.java:118)
#011at 
sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@17.0.9/SelectorImpl.java:129)
#011- locked <0x0006c1246410> (a sun.nio.ch.Util$2)
#011- locked <0x0006c1246318> (a sun.nio.ch.EPollSelectorImpl)
#011at sun.nio.ch.SelectorImpl.select(java.base@17.0.9/SelectorImpl.java:141)
#011at org.apache.kafka.common.network.Selector.select(Selector.java:874)
#011at org.apache.kafka.common.network.Selector.poll(Selector.java:465)
#011at kafka.network.Processor.poll(SocketServer.scala:1107)
#011at kafka.network.Processor.run(SocketServer.scala:1011)
#011at java.lang.Thread.run(java.base@17.0.9/Thread.java:840)

At the same time the Linux kernel reports repeatedly "TCP: out of memory – 
consider tuning tcp_mem".

We are running relatively big machines in production — c6g.4xlarge with 30 GB 
RAM and the auto-configured setting is: "net.ipv4.tcp_mem = 376608 502145 
753216", which corresponds to ~3 GB for the "high" parameter, assuming 4 KB 
memory pages.

We were able to reproduce the issue in our test environment (which is using 4x 
smaller machines), simply by tuning the tcp_mem down by a factor of 10: "sudo 
sysctl -w net.ipv4.tcp_mem='9234 12313 18469'". The strace of one of the busy 
Kafka threads shows the following syscalls repeating constantly:

epoll_pwait(15558, [\{events=EPOLLOUT, data={u32=12286, 
u64=468381628432382}}], 1024, 300, NULL, 8) = 1
fstat(12019,

{st_mode=S_IFREG|0644, st_size=414428357, ...}) = 0
fstat(12019, \{st_mode=S_IFREG|0644, st_size=414428357, ...}

) = 0
sendfile(12286, 12019, [174899834], 947517) = -1 EAGAIN (Resource temporarily 
unavailable)

Resetting the "tcp_mem" parameters back to the auto-configured values in the 
test environment removes the pressure from the broker and it can continue 
normally without restart.

We have found a bug report here that suggests that an issue may be partially 
due to a kernel bug: 
[https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335] 
(they are using version 5.15)

We have updated our kernel from 6.1.29 to 6.1.66 and it made it harder to 
reproduce the issue, but we can still do it by reducing all the of "tcp_mem" 
parameters by a factor of 1,000. The JVM behavior is the same under these 
conditions.

A similar issue is reported here, affecting Kafka Connect:
https://issues.apache.org/jira/browse/KAFKA-4739

Our production Kafka is running version 3.3.2, and test — 3.6.1.  The issue is 
present on both systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2023-12-27 Thread Divij Vaidya
I have started to perform an analysis of the OOM at
https://issues.apache.org/jira/browse/KAFKA-16052. Please feel free to
contribute to the investigation.

--
Divij Vaidya



On Wed, Dec 27, 2023 at 1:23 AM Justine Olshan 
wrote:

> I am still seeing quite a few OOM errors in the builds and I was curious if
> folks had any ideas on how to identify the cause and fix the issue. I was
> looking in gradle enterprise and found some info about memory usage, but
> nothing detailed enough to help figure the issue out.
>
> OOMs sometimes fail the build immediately and in other cases I see it get
> stuck for 8 hours. (See
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/2508/pipeline/12
> )
>
> I appreciate all the work folks are doing here and I will continue to try
> to help as best as I can.
>
> Justine
>
> On Tue, Dec 26, 2023 at 1:04 PM David Arthur
>  wrote:
>
> > S2. We’ve looked into this before, and it wasn’t possible at the time
> with
> > JUnit. We commonly set a timeout on each test class (especially
> integration
> > tests). It is probably worth looking at this again and seeing if
> something
> > has changed with JUnit (or our usage of it) that would allow a global
> > timeout.
> >
> >
> > S3. Dedicated infra sounds nice, if we can get it. It would at least
> remove
> > some variability between the builds, and hopefully eliminate the
> > infra/setup class of failures.
> >
> >
> > S4. Running tests for what has changed sounds nice, but I think it is
> risky
> > to implement broadly. As Sophie mentioned, there are probably some lines
> we
> > could draw where we feel confident that only running a subset of tests is
> > safe. As a start, we could probably work towards skipping CI for non-code
> > PRs.
> >
> >
> > ---
> >
> >
> > As an aside, I experimented with build caching and running affected
> tests a
> > few months ago. I used the opportunity to play with Github Actions, and I
> > quite liked it. Here’s the workflow I used:
> > https://github.com/mumrah/kafka/blob/trunk/.github/workflows/push.yml. I
> > was trying to see if we could use a build cache to reduce the compilation
> > time on PRs. A nightly/periodic job would build trunk and populate a
> Gradle
> > build cache. PR builds would read from that cache which would enable them
> > to only compile changed code. The same idea could be extended to tests,
> but
> > I didn’t get that far.
> >
> >
> > As for Github Actions, the idea there is that ASF would provide generic
> > Action “runners” that would pick up jobs from the Github Action build
> queue
> > and run them. It is also possible to self-host runners to expand the
> build
> > capacity of the project (i.e., other organizations could donate
> > build capacity). The advantage of this is that we would have more control
> > over our build/reports and not be “stuck” with whatever ASF Jenkins
> offers.
> > The Actions workflows are very customizable and it would let us create
> our
> > own custom plugins. There is also a substantial marketplace of plugins. I
> > think it’s worth exploring this more, I just haven’t had time lately.
> >
> > On Tue, Dec 26, 2023 at 3:24 PM Sophie Blee-Goldman <
> sop...@responsive.dev
> > >
> > wrote:
> >
> > > Regarding:
> > >
> > > S-4. Separate tests ran depending on what module is changed.
> > > >
> > > - This makes sense although is tricky to implement successfully, as
> > > > unrelated tests may expose problems in an unrelated change (e.g
> > changing
> > > > core stuff like clients, the server, etc)
> > >
> > >
> > > Imo this avenue could provide a massive improvement to dev productivity
> > > with very little effort or investment, and if we do it right, without
> > even
> > > any risk. We should be able to draft a simple dependency graph between
> > > modules and then skip the tests for anything that is clearly, provably
> > > unrelated and/or upstream of the target changes. This has the potential
> > to
> > > substantially speed up and improve the developer experience in modules
> at
> > > the end of the dependency graph, which I believe is worth doing even if
> > it
> > > unfortunately would not benefit everyone equally.
> > >
> > > For example, we can save a lot of grief with just a simple set of rules
> > > that are easy to check. I'll throw out a few to start with:
> > >
> > >1. A pure docs PR (ie that only touches files under the docs/
> > directory)
> > >should be allowed to skip the tests of all modules
> > >2. Connect PRs (that only touch connect/) only need to run the
> Connect
> > >tests -- ie they can skip the tests for core, clients, streams, etc
> > >3. Similarly, Streams PRs should only need to run the Streams tests
> --
> > >but again, only if all the changes are contained within streams/
> > >
> > > I'll let others chime in on how or if we can construct some safe rules
> as
> > > to which modules can or can't be skipped between the core, clients,
> raft,
> > > storage, etc
> > >
> 

[jira] [Created] (KAFKA-16053) Fix leaked Default DirectoryService

2023-12-27 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-16053:


 Summary: Fix leaked Default DirectoryService
 Key: KAFKA-16053
 URL: https://issues.apache.org/jira/browse/KAFKA-16053
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya
Assignee: Divij Vaidya


Heap dump hinted towards a leaked DefaultDirectoryService while running 
:core:test. It used 240MB of retained memory.

This Jira fixes the leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16052) OOM in Kafka test suite

2023-12-27 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-16052:


 Summary: OOM in Kafka test suite
 Key: KAFKA-16052
 URL: https://issues.apache.org/jira/browse/KAFKA-16052
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Divij Vaidya


Our test suite is failing with frequent OOM. Discussion in the mailing list is 
here: [https://lists.apache.org/thread/d5js0xpsrsvhgjb10mbzo9cwsy8087x4] 

To find the source of leaks, I ran the :core:test build target with a single 
thread (see below on how to do it) and attached a profiler to it. This Jira 
tracks the list of action items identified from the analysis.

How to run tests using a single thread:


{code:java}
diff --git a/build.gradle b/build.gradle
index f7abbf4f0b..81df03f1ee 100644
--- a/build.gradle
+++ b/build.gradle
@@ -74,9 +74,8 @@ ext {
       "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"
     )-  maxTestForks = project.hasProperty('maxParallelForks') ? 
maxParallelForks.toInteger() : Runtime.runtime.availableProcessors()
-  maxScalacThreads = project.hasProperty('maxScalacThreads') ? 
maxScalacThreads.toInteger() :
-      Math.min(Runtime.runtime.availableProcessors(), 8)
+  maxTestForks = 1
+  maxScalacThreads = 1
   userIgnoreFailures = project.hasProperty('ignoreFailures') ? ignoreFailures 
: false   userMaxTestRetries = project.hasProperty('maxTestRetries') ? 
maxTestRetries.toInteger() : 0
diff --git a/gradle.properties b/gradle.properties
index 4880248cac..ee4b6e3bc1 100644
--- a/gradle.properties
+++ b/gradle.properties
@@ -30,4 +30,4 @@ scalaVersion=2.13.12
 swaggerVersion=2.2.8
 task=build
 org.gradle.jvmargs=-Xmx2g -Xss4m -XX:+UseParallelGC
-org.gradle.parallel=true
+org.gradle.parallel=false {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Gaurav Narula
Congratulations Divij!

Regards,
Gaurav

> On 27-Dec-2023, at 17:44, Mickael Maison  wrote:
> 
> Congratulations Divij!
> 
>> On Wed, Dec 27, 2023 at 1:05 PM Sagar  wrote:
>> 
>> Congrats Divij! Absolutely well deserved !
>> 
>> Thanks!
>> Sagar.
>> 
>>> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:
>>> 
>>> Hi, Everyone,
>>> 
>>> Divij has been a Kafka committer since June, 2023. He has remained very
>>> active and instructive in the community since becoming a committer. It's my
>>> pleasure to announce that Divij is now a member of Kafka PMC.
>>> 
>>> Congratulations Divij!
>>> 
>>> Luke
>>> on behalf of Apache Kafka PMC
>>> 


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Mickael Maison
Congratulations Divij!

On Wed, Dec 27, 2023 at 1:05 PM Sagar  wrote:
>
> Congrats Divij! Absolutely well deserved !
>
> Thanks!
> Sagar.
>
> On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:
>
> > Hi, Everyone,
> >
> > Divij has been a Kafka committer since June, 2023. He has remained very
> > active and instructive in the community since becoming a committer. It's my
> > pleasure to announce that Divij is now a member of Kafka PMC.
> >
> > Congratulations Divij!
> >
> > Luke
> > on behalf of Apache Kafka PMC
> >


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Sagar
Congrats Divij! Absolutely well deserved !

Thanks!
Sagar.

On Wed, Dec 27, 2023 at 5:15 PM Luke Chen  wrote:

> Hi, Everyone,
>
> Divij has been a Kafka committer since June, 2023. He has remained very
> active and instructive in the community since becoming a committer. It's my
> pleasure to announce that Divij is now a member of Kafka PMC.
>
> Congratulations Divij!
>
> Luke
> on behalf of Apache Kafka PMC
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Boudjelda Mohamed Said
Congratulations Divij!


On Wed 27 Dec 2023 at 12:45, Luke Chen  wrote:

> Hi, Everyone,
>
> Divij has been a Kafka committer since June, 2023. He has remained very
> active and instructive in the community since becoming a committer. It's my
> pleasure to announce that Divij is now a member of Kafka PMC.
>
> Congratulations Divij!
>
> Luke
> on behalf of Apache Kafka PMC
>


[ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread Luke Chen
Hi, Everyone,

Divij has been a Kafka committer since June, 2023. He has remained very
active and instructive in the community since becoming a committer. It's my
pleasure to announce that Divij is now a member of Kafka PMC.

Congratulations Divij!

Luke
on behalf of Apache Kafka PMC


Re: [DISCUSS] KIP-971 Expose replication-offset-lag MirrorMaker2 metric

2023-12-27 Thread Elxan Eminov
Hi all,
I've updated the KIP with the details we discussed in this thread.
I'll call in a vote after the holidays if everything looks good.
Thanks!

On Sat, 26 Aug 2023 at 15:49, Elxan Eminov  wrote:

> Relatively minor change with a new metric for MM2
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric
>