Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-09-19 Thread David Jacot
Hi,

Thanks for the update. I have a few nits:

> If the member ID is null or empty, the server will reject the request
with an InvalidRequestException.
We should clarify that this should only apply to version >= 1.

> The consumer instance must generate a member ID, and this ID should
remain consistent for the duration of the consumer's session. Here, a
"session" is defined as the period from the consumer's first heartbeat
until it leaves the group, either through a graceful shutdown, a heartbeat
timeout, or the process stopping or dying. The consumer instance should
reuse the same member ID for all heartbeats and rejoin attempts to maintain
continuity within the group.

This part is not clear to me. When the member leaves the group, it should
not reset the member id. I would rather say that the member must generate
its member id when it starts and it must keep it until the process stops.
It is basically an incarnation of the process.

> If a conflict arises where the member ID generated by the client is
detected to be a duplicate within the same group (for example, the same
member ID is associated with another active member in the group), the
server will handle this by comparing the memberEpoch values of the
conflicting members. The member with the lower memberEpoch is considered
outdated and will be fenced off by the server. When this occurs, the server
responds with a FENCED_MEMBER_EPOCH error to the client, signaling it to
rejoin the group with the same member ID while resetting the memberEpoch to
zero. This ensures that the client properly resynchronizes and maintains
the continuity and consistency of the group membership.

This part is not clear either. It basically says that if a member joins
with an existing member id but a different epoch, it will be fenced. Then
it must rejoin with the same member id and epoch zero. This is already the
current behavior and it does not help with detecting duplicates, right?
Should we just remove the paragraph?

> A member ID mismatch occurs within a session: If the server detects a
mismatch between the provided member ID and the expected member ID for an
ongoing session, it should return a UNKNOWN_MEMBER_ID  error.

How could we detect a mismatch between the provided and the expected member
id? My understanding is that we can only know whether the provided member
id exists or not. This is already implemented.

Thanks,
David

On Sat, Sep 14, 2024 at 9:31 AM TengYao Chi  wrote:

> Hello everyone,
>
> Since this KIP has been fully discussed, I will initiate a vote for it next
> Monday.
> Thank you and have a nice weekend.
>
> Best regards,
> TengYao
>
> TengYao Chi  於 2024年9月5日 週四 下午2:19寫道:
>
> > Hello everyone,
> >
> > KT2: It looks like everyone who has expressed an opinion supports the
> > second option: “Document a recommendation for clients to use UUIDs as
> > member IDs, without strictly enforcing it.”
> > I have updated the KIP accordingly.
> > Please take a look, and let me know if you have any thoughts or feedback.
> >
> > Thank you!
> >
> > Best regards,
> > TengYao
> >
> > Chia-Ping Tsai  於 2024年8月30日 週五 下午9:56寫道:
> >
> >> hi TengYao
> >>
> >> KT2: +1 to second approach
> >>
> >> Best,
> >> Chia-Ping
> >>
> >>
> >> David Jacot  於 2024年8月30日 週五 下午9:15寫道:
> >>
> >> > Hi TengYao,
> >> >
> >> > KT2: I don't think that we can realistically validate the uuid on the
> >> > server. It is basically a string of chars. So I lean towards having a
> >> good
> >> > recommendation in the KIP and in the document of the field in the
> RPC's
> >> > definition.
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi 
> >> wrote:
> >> >
> >> > > Hello Kirk !
> >> > >
> >> > > Thank you for your comments !
> >> > >
> >> > > KT1: Yes, you are correct. The issue is not unique to the initial
> >> > > heartbeat; there can always be cases where the broker might lose
> >> > connection
> >> > > with a member.
> >> > >
> >> > > KT2: Currently, if the client doesn't have a member ID and the
> >> > memberEpoch
> >> > > equals 0, the coordinator will generate a UUID as the member ID for
> >> the
> >> > > client. However, at the RPC level, the member ID is sent as a
> literal
> >> > > string, meaning there are no restrictions on the format at this
> level.
> >> > > This also reminds me that we haven

[jira] [Created] (KAFKA-17571) Revert #17219

2024-09-17 Thread David Jacot (Jira)
David Jacot created KAFKA-17571:
---

 Summary: Revert #17219
 Key: KAFKA-17571
 URL: https://issues.apache.org/jira/browse/KAFKA-17571
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: David Jacot
Assignee: David Jacot


Revert https://github.com/apache/kafka/pull/17219



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17306) Soften the validation when replaying tombstones

2024-09-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17306.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Soften the validation when replaying tombstones
> ---
>
> Key: KAFKA-17306
> URL: https://issues.apache.org/jira/browse/KAFKA-17306
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>    Assignee: David Jacot
>Priority: Major
> Fix For: 4.0.0
>
>
> At present, replaying the tombstones requires the deleted entity to exist. 
> However the record to create the entity can be removed and only leave the 
> tombstone after compaction.
> This can cause error when a group coordinator loads a new __consumer_offsets 
> partition, as some entities can be removed without creation during replay. As 
> a result, we should soften the validation when replaying tombstones to avoid 
> it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15756) Migrate existing integration tests to run old protocol in new coordinator

2024-09-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15756.
-
Resolution: Won't Do

The new group coordinator is now used by default so all the tests use it by 
default too unless specified otherwise.

> Migrate existing integration tests to run old protocol in new coordinator
> -
>
> Key: KAFKA-15756
> URL: https://issues.apache.org/jira/browse/KAFKA-15756
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
>
> There is one flaky test left, we need to figure out how to reduce the 
> flakiness.
> {code:java}
> testConsumptionWithBrokerFailures{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15621) Add histogram metrics to GroupCoordinatorRuntimeMetrics

2024-09-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15621.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Add histogram metrics to GroupCoordinatorRuntimeMetrics
> ---
>
> Key: KAFKA-15621
> URL: https://issues.apache.org/jira/browse/KAFKA-15621
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 4.0.0
>
>
> We will add new histograms to Kafka Metrics library soon. When we have it, we 
> can start using it in GroupCoordinatorRuntimeMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] New committer: Jeff Kim

2024-09-08 Thread David Jacot
Hi all,

The PMC of Apache Kafka is pleased to announce a new Kafka committer, Jeff Kim.

Jeff has been a Kafka contributor since May 2020. In addition to being
a regular contributor and reviewer, he has made significant
contributions to the next generation of the consumer rebalance
protocol (KIP-848) and to the new group coordinator. He authored
KIP-915 which improved how coordinators can be downgraded. He also
contributed multiple fixes/improvements to the fetch from follower
feature.

Congratulations, Jeff!

Thanks,
David (on behalf of the Apache Kafka PMC)


Re: [ANNOUNCE] New Kafka PMC Member: Josep Prat

2024-09-06 Thread David Jacot
Congrats!

Le sam. 7 sept. 2024 à 05:27, Yash Mayya  a écrit :

> Congratulations Josep!
>
> On Fri, 6 Sept, 2024, 21:55 Chris Egerton,  wrote:
>
> > Hi all,
> >
> > Josep has been a Kafka committer since December 2022. He has remained
> very
> > active and instructive in the community since then, and it's my pleasure
> to
> > announce that he has accepted our invitation to become a member of the
> > Kafka PMC.
> >
> > Congratulations Josep! Enjoy voting on those release candidates :)
> >
> > Chris, on behalf of the Apache Kafka PMC
> >
>


Re: New Group Coordinator (KIP-848)

2024-09-06 Thread David Jacot
Hi Chris,

Thanks for raising this. We will definitely take a look into it.

Cheers,
David

On Fri, Sep 6, 2024 at 4:44 PM Chris Egerton 
wrote:

> Hi David,
>
> CCing here for visibility: I think this change has caused an uptick in
> flakiness for the Connect OffsetsApiIntegrationTest suite. Gradle
> Enterprise shows that in the week before this email was sent, the test
> suite had a flakiness rate of about 4% [1], and in the week and a half
> since it was sent, the flakiness rate has jumped to 17% [2].
>
> I've filed KAFKA-17493 [3] to track. Hoping someone familiar with KIP-848
> can take a look; happy to be a point of contact for any Connect-specific
> information if that helps.
>
> [1] -
>
> https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=172455840&search.startTimeMin=172395360&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY
> [2] -
>
> https://ge.apache.org/scans/tests?search.rootProjectNames=kafka&search.startTimeMax=172568159&search.startTimeMin=172473120&search.tags=trunk&search.timeZoneId=America%2FNew_York&tests.container=org.apache.kafka.connect.integration.*&tests.sortField=FLAKY
> [3] - https://issues.apache.org/jira/browse/KAFKA-17493
>
> Cheers,
>
> Chris
>
> On Mon, Aug 26, 2024 at 4:21 AM David Jacot 
> wrote:
>
> > Hi folks,
> >
> > I wanted to let you know that the new group coordinator that we developed
> > as part of KIP-848 is now the default group coordinator in trunk (kraft
> > only). Hence all the integration tests, all the system tests and all the
> > kraft clusters created based on trunk use it by default unless specified
> > otherwise. If you encounter any issues with it, please, file a Jira and
> let
> > me know.
> >
> > Best,
> > David
> >
>


Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-08-30 Thread David Jacot
Hi TengYao,

KT2: I don't think that we can realistically validate the uuid on the
server. It is basically a string of chars. So I lean towards having a good
recommendation in the KIP and in the document of the field in the RPC's
definition.

Best,
David

On Fri, Aug 30, 2024 at 3:02 PM TengYao Chi  wrote:

> Hello Kirk !
>
> Thank you for your comments !
>
> KT1: Yes, you are correct. The issue is not unique to the initial
> heartbeat; there can always be cases where the broker might lose connection
> with a member.
>
> KT2: Currently, if the client doesn't have a member ID and the memberEpoch
> equals 0, the coordinator will generate a UUID as the member ID for the
> client. However, at the RPC level, the member ID is sent as a literal
> string, meaning there are no restrictions on the format at this level.
> This also reminds me that we haven't reached a final conclusion on how to
> enforce the use of UUIDs.
> From our previous discussions, I recall two possible approaches:
> The first is to validate the UUID on the server side, and if it's not
> valid, throw an exception to the client.
> The second is to document a recommendation for clients to use UUIDs as
> member IDs, without strictly enforcing it.
> I think it's time to decide on the approach we want to take.
>
> KT3: Yes, "session" can be considered synonymous with "membership" in this
> context.
>
> KT4: Thank you for pointing that out. I will update the wording to
> specifically say this behavior is for consumers.
>
> Thanks again for your comments.
>
> Best regards,
> TengYao
>
> Kirk True  於 2024年8月30日 週五 上午12:39寫道:
>
> > Hi TengYao!
> >
> > Sorry for being late to the discussion...
> >
> > After reading the thread and then the KIP, I had a few
> questions/comments:
> >
> > KT1: In Motivation, it states: "This scenario can result in the broker
> > registering a new member for which it will never receive a proper leave
> > request.” Just to be clear, the broker will always have cases where it
> > might lose connection with a member. That’s not unique to the initial
> > heartbeat, right?
> >
> > KT2: There was a bit of back and forth about format of the member ID.
> From
> > what I gathered in the thread, the member ID is still defined in the RPC
> as
> > a string and not a UUID, right? The KIP states that the “client must
> > generate a UUID as the member ID” and that the “server will validate
> that a
> > valid UUID is provided.” Is that a change for the server, or is it
> already
> > enforced as a UUID?
> >
> > KT3: Lianet mentioned some confusion over the use of the word “session.”
> > Isn’t “session” synonymous with “membership?”
> >
> > KT4: Under “Member ID Lifecycle,” it states: "The client should reuse the
> > same UUID as the member ID for all heartbeats and rejoin attempts to
> > maintain continuity within the group.” Could we change the first part of
> > that to “The Consumer instance should…” We do have lifetimes that extend
> > past the lifetime of a client instance (such as the transaction ID).
> >
> > Thanks,
> > Kirk
> >
> > > On Aug 29, 2024, at 1:28 AM, TengYao Chi  wrote:
> > >
> > > Hi David,
> > >
> > > Thank you for pointing that out.
> > > I have updated the content of the KIP based on Lianet's and your
> > feedback.
> > > Please take a look and let me know your thoughts.
> > >
> > > Best regards,
> > > TengYao
> > >
> > > David Jacot  於 2024年8月29日 週四 下午3:20寫道:
> > >
> > >> Hi TengYao,
> > >>
> > >> Thanks for the update. I haven't fully read it yet but I will soon.
> > >>
> > >> LM4: This is incorrect. The consumer must keep its member id during
> its
> > >> entire lifetime (until the process stops or dies). The protocol
> > stipulates
> > >> that a member must rejoin with the same member id and the member epoch
> > set
> > >> to zero when an FENCED_MEMBER_EPOCH occurs. This allows the member to
> > >> resynchronize itself. We should not change this behavior. I think that
> > we
> > >> should see the client side generation id as an incarnation id of the
> > >> application. It is generated once and kept until it stops or dies.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Thu, Aug 29, 2024 at 6:21 AM TengYao Chi 
> > wrote:
> > >>
> > >>> Hello Lianet !
> > >>>
&g

[jira] [Resolved] (KAFKA-17413) Re-introduce `group.version` feature flag

2024-08-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17413.
-
Resolution: Fixed

> Re-introduce `group.version` feature flag
> -
>
> Key: KAFKA-17413
> URL: https://issues.apache.org/jira/browse/KAFKA-17413
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-08-29 Thread David Jacot
> yet.(So
> > > >> >> far according to this thread)
> > > >> >> However, I will review the current implementation in the Kafka
> > `Uuid`
> > > >> >> class and include a brief specification in the KIP.
> > > >> >>
> > > >> >> Once again, thank you so much for your help.
> > > >> >>
> > > >> >> Best regards,
> > > >> >> TengYao
> > > >> >>
> > > >> >> Chia-Ping Tsai  於 2024年8月14日 週三 下午11:14寫道:
> > > >> >>
> > > >> >>> hi Apoorv
> > > >> >>>
> > > >> >>>> As the memberId is now known to the client, and client might
> send
> > > the
> > > >> >>> leave
> > > >> >>> group heartbeat on shutdown prior to receiving the initial
> > heartbeat
> > > >> >>> response. If that's true then how do we guarantee that the 2
> > > requests
> > > >> to
> > > >> >>> join and leave will be processed in order, which could still
> leave
> > > >> stale
> > > >> >>> members or throw unknown member id exceptions?
> > > >> >>>
> > > >> >>> This is definitely a good question. the short answer: no
> guarantee
> > > but
> > > >> >>> best
> > > >> >>> efforts
> > > >> >>>
> > > >> >>> Please notice the root cause is "we have no enough time to wait
> > > >> member id
> > > >> >>> (response) when closing consumer". Sadly, we can' guarantee the
> > > >> request
> > > >> >>> order due to the same reason.
> > > >> >>>
> > > >> >>> However, in contrast to previous behavior, there is one big
> > benefit
> > > >> of new
> > > >> >>> approach - we can try STONITH because we know the member id
> > > >> >>>
> > > >> >>> Best,
> > > >> >>> Chia-Ping
> > > >> >>>
> > > >> >>>
> > > >> >>> Apoorv Mittal  於 2024年8月14日 週三
> > 下午8:55寫道:
> > > >> >>>
> > > >> >>>> Hi TengYao,
> > > >> >>>> Thanks for the KIP. Continuing on the point which Andrew
> > mentioned
> > > as
> > > >> >>> AS1.
> > > >> >>>>
> > > >> >>>> As the memberId is now known to the client, and client might
> send
> > > the
> > > >> >>> leave
> > > >> >>>> group heartbeat on shutdown prior to receiving the initial
> > > heartbeat
> > > >> >>>> response. If that's true then how do we guarantee that the 2
> > > >> requests to
> > > >> >>>> join and leave will be processed in order, which could still
> > leave
> > > >> stale
> > > >> >>>> members or throw unknown member id exceptions?
> > > >> >>>>
> > > >> >>>> Though the client side member id generation is helpful which
> will
> > > >> >>> represent
> > > >> >>>> the same group perspective as from client and broker's end.
> But I
> > > >> think
> > > >> >>> the
> > > >> >>>> major concern we want to solve here is Stale Partition
> > Assignments
> > > >> which
> > > >> >>>> might still exist with the new approach. I am leaning towards
> the
> > > >> >>>> suggestion mentioned by Andrew where partition assignment
> > triggers
> > > on
> > > >> >>>> subsequent heartbeat when client acknowledges the initial
> > > heartbeat,
> > > >> >>>> delayed partition assignment.
> > > >> >>>>
> > > >> >>>> Though on a separate note, I have a different question. What
> > > happens
> > > >> >>> when
> > > >> >>>> there is an issue with the client which sends the initial
> > heartbeat
> > > >> >>> without
> > > >> >>>> memberId, th

[ANNOUNCE] New committer: Lianet Magrans

2024-08-28 Thread David Jacot
Hi all,

The PMC of Apache Kafka is pleased to announce a new Kafka committer,
Lianet Magrans.

Lianet has been a Kafka contributor since June 2023. In addition to
being a regular contributor and reviewer, she has made significant
contributions to the next generation of the consumer rebalance
protocol (KIP-848) and to the new consumer. She has also contributed
to discussing and reviewing many KIPs.

Congratulations, Lianet!

Thanks,
David (on behalf of the Apache Kafka PMC)


[jira] [Resolved] (KAFKA-17327) Add support of group in kafka-configs.sh

2024-08-27 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17327.
-
Resolution: Fixed

> Add support of group in kafka-configs.sh
> 
>
> Key: KAFKA-17327
> URL: https://issues.apache.org/jira/browse/KAFKA-17327
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Lan Ding
>Assignee: Lan Ding
>Priority: Major
> Fix For: 4.0.0
>
>
> Add support of group in kafka-configs.sh



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17376) Use the new group coordinator by default in 4.0

2024-08-26 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17376.
-
Resolution: Fixed

> Use the new group coordinator by default in 4.0
> ---
>
> Key: KAFKA-17376
> URL: https://issues.apache.org/jira/browse/KAFKA-17376
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-14048) The Next Generation of the Consumer Rebalance Protocol

2024-08-26 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot reopened KAFKA-14048:
-

> The Next Generation of the Consumer Rebalance Protocol
> --
>
> Key: KAFKA-14048
> URL: https://issues.apache.org/jira/browse/KAFKA-14048
> Project: Kafka
>  Issue Type: Improvement
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 4.0.0
>
>
> This Jira tracks the development of KIP-848: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


New Group Coordinator (KIP-848)

2024-08-26 Thread David Jacot
Hi folks,

I wanted to let you know that the new group coordinator that we developed
as part of KIP-848 is now the default group coordinator in trunk (kraft
only). Hence all the integration tests, all the system tests and all the
kraft clusters created based on trunk use it by default unless specified
otherwise. If you encounter any issues with it, please, file a Jira and let
me know.

Best,
David


[jira] [Resolved] (KAFKA-14048) The Next Generation of the Consumer Rebalance Protocol

2024-08-26 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14048.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> The Next Generation of the Consumer Rebalance Protocol
> --
>
> Key: KAFKA-14048
> URL: https://issues.apache.org/jira/browse/KAFKA-14048
> Project: Kafka
>  Issue Type: Improvement
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 4.0.0
>
>
> This Jira tracks the development of KIP-848: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17413) Re-introduce `group.version` feature flag

2024-08-23 Thread David Jacot (Jira)
David Jacot created KAFKA-17413:
---

 Summary: Re-introduce `group.version` feature flag
 Key: KAFKA-17413
 URL: https://issues.apache.org/jira/browse/KAFKA-17413
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16379) Coordinator flush time and event purgatory time metrics

2024-08-23 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16379.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Coordinator flush time and event purgatory time metrics
> ---
>
> Key: KAFKA-16379
> URL: https://issues.apache.org/jira/browse/KAFKA-16379
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17279) Handle retriable errors from offset fetches in ConsumerCoordinator

2024-08-21 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17279.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Handle retriable errors from offset fetches in ConsumerCoordinator
> --
>
> Key: KAFKA-17279
> URL: https://issues.apache.org/jira/browse/KAFKA-17279
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Sean Quah
>Assignee: Sean Quah
>Priority: Minor
> Fix For: 3.9.0
>
>
> Currently {{ConsumerCoordinator}}'s {{OffsetFetchResponseHandler}} only 
> retries on {{COORDINATOR_LOAD_IN_PROGRESS}} and {{NOT_COORDINATOR}} errors.
> The error handling should be expanded to retry on all retriable errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17383) Update upgrade notes about removal of `offsets.commit.required.acks`

2024-08-20 Thread David Jacot (Jira)
David Jacot created KAFKA-17383:
---

 Summary: Update upgrade notes about removal of 
`offsets.commit.required.acks`
 Key: KAFKA-17383
 URL: https://issues.apache.org/jira/browse/KAFKA-17383
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16503) getOrMaybeCreateClassicGroup should not thrown GroupIdNotFoundException

2024-08-20 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16503.
-
Resolution: Fixed

Addressed by https://github.com/apache/kafka/pull/16919.

> getOrMaybeCreateClassicGroup should not thrown GroupIdNotFoundException
> ---
>
> Key: KAFKA-16503
> URL: https://issues.apache.org/jira/browse/KAFKA-16503
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
>
> It looks like `getOrMaybeCreateClassicGroup` method throws an 
> `GroupIdNotFoundException` error when the group exists but with the wrong 
> type. As `getOrMaybeCreateClassicGroup` is mainly used on the 
> join-group/sync-group APIs, this seems incorrect. We need to double check and 
> fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17376) Use the new group coordinator by default in 4.0

2024-08-20 Thread David Jacot (Jira)
David Jacot created KAFKA-17376:
---

 Summary: Use the new group coordinator by default in 4.0
 Key: KAFKA-17376
 URL: https://issues.apache.org/jira/browse/KAFKA-17376
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] GitHub CI

2024-08-16 Thread David Jacot
Hi David,

Thanks for working on this. Overall, I am supportive. I have two
questions/comments.

1. I wonder if we should discuss with the infra team in order to ensure
that they have enough capacity for us to use the action runners. Our CI is
pretty greedy in general. We could also discuss with them whether they
could move the capacity that we used in Jenkins to the runners. I think
that Kafka was one of the most, if not the most, heavy users of the shared
Jenkins infra. I think that they will appreciate the heads up.

2. Would it be possible to improve how failed tests are reported? For
instance, the tests in your PR failed with `1448 tests completed, 2
failed`. First it is quite hard to see it because the logs are long. Second
it is almost impossible to find those two failed tests. In my opinion, we
can not use it in the current state to merge pull requests. Do you know if
there are ways to improve this?

Best,
David

On Fri, Aug 16, 2024 at 2:44 PM 黃竣陽  wrote:

> Hello David,
>
> I find the Jenkins UI to be quite unfriendly for developers, and the
> Apache Jenkins instance is often unreliable.
> On the other hand, the new GitHub Actions UI is much more appealing to me.
> If GitHub Actions proves to be more
> stable than Jenkins, I believe it would be a worthwhile change to switch
> to GitHub Actions.
>
> Thank you.
>
> Best Regards,
> Jiunn Yang
> > Josep Prat  於 2024年8月16日 下午4:57 寫道:
> >
> > Hi David,
> > One of the enhancements we can have with this change (it's easier to do
> > with GH actions) is to write back the result of the CI run as a comment
> on
> > the PR itself. I believe not needing to periodically check CI to see if
> the
> > run finished would be a great win. By having CI commenting on the PR
> > everyone watching the PR (author and reviewers) will get notified when
> it's
> > done.
>
>


[jira] [Resolved] (KAFKA-17295) New consumer fails with assert in consumer_test.py’s test_fencing_static_consumer system test

2024-08-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17295.
-
  Assignee: Dongnuo Lyu
Resolution: Fixed

Fixed by https://github.com/apache/kafka/pull/16845.

> New consumer fails with assert in consumer_test.py’s 
> test_fencing_static_consumer system test
> -
>
> Key: KAFKA-17295
> URL: https://issues.apache.org/jira/browse/KAFKA-17295
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.8.0
>Reporter: Kirk True
>Assignee: Dongnuo Lyu
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 4.0.0
>
>
> I'm occasionally seeing this error in {{test_fencing_static_consumer}}:
> {code}
> AssertionError('Static consumers attempt to join with instance id in use 
> should not cause a rebalance')
> Traceback (most recent call last):
>   File 
> "/home/semaphore/kafka-overlay/kafka/venv/lib/python3.8/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/semaphore/kafka-overlay/kafka/venv/lib/python3.8/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/semaphore/kafka-overlay/kafka/venv/lib/python3.8/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/semaphore/kafka-overlay/kafka/tests/kafkatest/tests/client/consumer_test.py",
>  line 366, in test_fencing_static_consumer
> assert num_rebalances == consumer.num_rebalances(), "Static consumers 
> attempt to join with instance id in use should not cause a rebalance"
> AssertionError: Static consumers attempt to join with instance id in use 
> should not cause a rebalance{code}
> The parameters to the test were:
> * {{num_conflict_consumers}}: {{1}}
> * {{fencing_stage}}: {{stable}}
> * {{metadata_quorum}}: {{ISOLATED_KRAFT}}
> * {{use_new_coordinator}}: {{True}}
> * {{group_protocol}}: {{consumer}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17219) Adjust system test framework for new protocol consumer

2024-08-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17219.
-
  Assignee: Dongnuo Lyu
Resolution: Fixed

Fixed by https://github.com/apache/kafka/pull/16845.

> Adjust system test framework for new protocol consumer
> --
>
> Key: KAFKA-17219
> URL: https://issues.apache.org/jira/browse/KAFKA-17219
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Blocker
>  Labels: kip-848-client-support, system-tests
> Fix For: 4.0.0
>
>
> The current test framework doesn't work well with the existing tests using 
> the new consumer protocol. There are two main issues I've seen.
>  
> First, we sometimes assume there is no rebalance triggered, for instance in 
> {{consumer_test.py::test_consumer_failure}}
> {code:java}
> verify that there were no rebalances on failover
> assert num_rebalances == consumer.num_rebalances(), "Broker failure should 
> not cause a rebalance"{code}
> The current frame work calculates {{num_rebalances}} by increment by one 
> every time a new assignment is received, so if a reconciliation happened 
> during the failover, {{num_rebalances}} will also be incremented. For new 
> protocol we need a new way to update {{{}num_rebalances{}}}.
>  
> Second, for the new protocol, we need a way to make sure all members have 
> joined {*}and stablized{*}. Currently we only make sure all members have 
> joined (the event handlers are all in Joined state), where some partitions 
> haven't been assigned and more time is needed for reconciliation. The issue 
> can cause failure in assertions like timeout waiting for consumption and
> {code:java}
> partition_owner = consumer.owner(partition)
> assert partition_owner is not None {code}
>  
> For a short term solution, we can make the tests pass by bypassing with 
> adding {{time.sleep}} or skip checking {{{}num_rebalance{}}}. To truly fix 
> them, we should adjust 
> {{tools/src/main/java/org/apache/kafka/tools/VerifiableConsumer.java}} to 
> work well with the new protocol.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16576) New consumer fails with assert in consumer_test.py’s test_consumer_failure system test

2024-08-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16576.
-
  Assignee: Dongnuo Lyu
Resolution: Fixed

Fixed by https://github.com/apache/kafka/pull/16845.

> New consumer fails with assert in consumer_test.py’s test_consumer_failure 
> system test
> --
>
> Key: KAFKA-16576
> URL: https://issues.apache.org/jira/browse/KAFKA-16576
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, system tests
>Affects Versions: 3.7.0
>Reporter: Kirk True
>Assignee: Dongnuo Lyu
>Priority: Blocker
>  Labels: flaky-test, kip-848-client-support, system-tests
> Fix For: 4.0.0
>
>
> The {{consumer_test.py}} system test intermittently fails with the following 
> error:
> {code}
> test_id:
> kafkatest.tests.client.consumer_test.OffsetValidationTest.test_consumer_failure.clean_shutdown=True.enable_autocommit=True.metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer
> status: FAIL
> run time:   42.582 seconds
> AssertionError()
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 184, in _do_run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/tests/runner_client.py",
>  line 262, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/venv/lib/python3.7/site-packages/ducktape/mark/_mark.py",
>  line 433, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-branch-builder/kafka/tests/kafkatest/tests/client/consumer_test.py",
>  line 399, in test_consumer_failure
> assert partition_owner is not None
> AssertionError
> Notify
> {code}
> Affected tests:
>  * {{test_consumer_failure}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14510) Extend DescribeConfigs API to support group configs

2024-08-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14510.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Extend DescribeConfigs API to support group configs
> ---
>
> Key: KAFKA-14510
> URL: https://issues.apache.org/jira/browse/KAFKA-14510
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Lan Ding
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-08-14 Thread David Jacot
Hi Andrew,

Personally, I don't like the lobby approach. It makes things more
complicated and it would require changing the records on the server too.
This is why I initially suggested the rejected alternative #2 which is
pretty close but also not perfect.

I'd like to clarify one thing. The ConsumerGroupHeartbeat API already
supports generating the member id on the client so we don't need any
conditional logic on the client side. This is actually what we wanted to do
in the first place but the idea got pushed back by Magnus back then because
generating uuid from librdkafka required a new dependency. It turns out
that librdkafka has that dependency today. In retrospect, we should have
pushed back on this. Long story short, we can just do it. The proposal in
this KIP is to make the member id required in future versions. We could
also decide not to do it and to keep supporting both approaches. I would
also be fine with this.

Best,
David

On Wed, Aug 14, 2024 at 12:30 PM Andrew Schofield 
wrote:

> Hi TengYao,
> Thanks for your response. I’ll have just one more try to persuade.
> I feel that I will need to follow the approach with KIP-932 when we’ve
> made a decision, so I do have more than a passing interest in this.
>
> A group member in the lobby is in the group, but it does not have any
> assignments. A member of a consumer group can have no assigned
> partitions (such as 5 CG members subscribed to a topic with 4 partitions),
> so it’s a situation that consumer group members already expect.
>
> One of Kafka’s strengths is the way that we handle API versioning.
> But, there is a cost - the behaviour is different depending on the RPC
> version. KIP-848 is on the cusp of completion, but we’re already adding
> conditional logic for v0/v1 for ConsumerGroupHeartbeat. That’s a pity.
> Only a minor issue, but it’s unfortunate.
>
> Thanks,
> Andrew
>
> > On 14 Aug 2024, at 08:47, TengYao Chi  wrote:
> >
> > Hello Andrew
> > Thank you for your thoughtful suggestions and getting the discussion
> going.
> >
> > To AS1:
> > In the current scenario where the server generates the UUID, if the
> client
> > shuts down before receiving the memberId generated by the GC (regardless
> of
> > whether it’s a graceful shutdown or not), the GC will still have to wait
> > for the heartbeat timeout because the client doesn’t know its memberId.
> > This KIP indeed cannot completely resolve the idempotency issue, but it
> can
> > better handle shutdown scenarios under normal circumstances because the
> > client always knows its memberId. Even if the client shuts down
> immediately
> > after the initial heartbeat, as long as it performs a graceful shutdown
> and
> > sends a leave heartbeat, the GC can manage the situation and remove the
> > member. Therefore, the goal of this KIP is to address the issue where the
> > GC has to wait for the heartbeat timeout due to the client leaving
> without
> > knowing its memberId, which leads to reduced throughput and limited
> > scalability.
> >
> > The solution you suggest has also been proposed by David. The concern
> with
> > this approach is that it introduces additional complexity for
> > compatibility, as the new server would not immediately add the member to
> > the group, while the old server would. This requires clients to
> > differentiate whether their memberId has been added to the group or not,
> > which could result in unexpected logs.
> >
> > Best Regards,
> > TengYao
> >
> > Andrew Schofield  於 2024年8月14日 週三 上午12:29寫道:
> >
> >> Hi TengYao,
> >> Thanks for the KIP. I wonder if there’s a different way to close what
> >> is quite a small window.
> >>
> >> AS1: It is true that the initial heartbeat is not idempotent, but this
> >> remains
> >> true with this KIP. It’s just differently not idempotent. If the client
> >> makes its
> >> own member ID, sends a request and dies, the GC will still have added
> >> the member to the group and it will hang around until the session
> expires.
> >>
> >> I wonder if the GC could still generate the member ID in response to the
> >> first
> >> heartbeat, and put the member in a special PENDING state with no
> >> assignments until the client sends the next heartbeat, thus confirming
> it
> >> has received the member ID. This would not be a protocol change at all,
> >> just
> >> a change to the GC to keep a new member in the lobby until it’s
> comfirmed
> >> it knows its member ID.
> >>
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 13 Aug 2024, at 15:59, TengYao Chi  wrote:
> >>>
> >>> Hi Chia-Ping,
> >>>
> >>> Thanks for review and suggestions.
> >>> I have updated the content of KIP accordingly.
> >>> Please take a look.
> >>>
> >>> Best regards,
> >>> TengYao
> >>>
> >>> Chia-Ping Tsai  於 2024年8月13日 週二 下午9:45寫道:
> >>>
>  hi TengYao
> 
>  thanks for this KIP.
> 
>  1) could you please describe the before/after behavior in the
> "Proposed
>  Changes" section? IIRC, current RPC allows HB having member id
> >> generated by
> >>>

Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-08-14 Thread David Jacot
> I don't want to be a hater of idempotent HB, but having a "RPC" used to
generate UUID is unnecessary to me.

I actually agree with you. I am just trying to argue for it.

> I'm not sure whether it is worth requiring the UUID format for member id.
In the protocol, we declare the field "memberId" as "String" rather than
"uuid". The scope of member id is in "a group", so I guess the collision
won't be a big issue.

I think that this should be defined. Otherwise, folks may start using
incorrect things. In the java client, we will generate a Kafka UUID.
The javadoc of the Uuid class has a good definition for it.

> For another, I always wonder why we trust clients to generate "unique"
transaction id but we worry about generating "unique" member id on client
side?

This is perhaps not the best comparison. The transactional id is the most
misused thing in Apache Kafka. However, there is a notable difference. The
transaction id is provided by the end user where the member id is generated
by the client itself.


On Wed, Aug 14, 2024 at 11:56 AM Chia-Ping Tsai  wrote:

> In my opinion, the main downside of this
> approach is that if you leave after receiving the first HB and the member
> id, the server will respond with an unknown member id error because the
> member is not really in the group yet.
>
> I don't want to be a hater of idempotent HB, but having a "RPC" used to
> generate UUID is unnecessary to me.
>
> In the proposed changes section, we should elaborate the uuid generation
> part. Do we have recommendations there? Do we have requirements for the
> uuid (version, uniqueness, etc.)?
>
>
> I'm not sure whether it is worth requiring the UUID format for member id.
> In the protocol, we declare the field "memberId" as "String" rather than
> "uuid". The scope of member id is in "a group", so I guess the collision
> won't be a big issue.
>
> For another, I always wonder why we trust clients to generate "unique"
> transaction id but we worry about generating "unique" member id on client
> side?
>
> Best,
> Chia-Ping
>


Re: [DISCUSS] KIP-1082: Enable ID Generation for Clients over the ConsumerGroupHeartbeat RPC

2024-08-14 Thread David Jacot
Hi TengYao,

Thanks for the KIP! I have a couple of comments.

1. In the motivation section, I would really start from the fundamental
issue which is that the initial heartbeat is not idempotent. Then, we could
describe the undesired side effects (e.g. ghost members, cannot leave
without receiving the member id, etc.). This is the core issue that we want
to solve with this KIP, I think.

2. In the public interface section, we need to explain that we will bump
the version of the API and that we will require the member id to be
provided from the new version on. I would also mention that we will return
invalid request if the member id is not provided.

3. In the proposed changes section, we should elaborate the uuid generation
part. Do we have recommendations there? Do we have requirements for the
uuid (version, uniqueness, etc.)?

4. In the compatibility section, I wonder if we could simplify it a bit. In
the end, the change is backward compatible because the version 1 of the RPC
already supports a member id provided by the client.

5. In KIP-848, we actually rejected the proposed option. It would be great
if we could explain why we changed our mind in the motivation section. I
think that the main argument was that generating uuid was requiring extra
libraries in some languages (e.g. for librdkafka).

6. Regarding the second rejected alternative, I feel like the arguments for
rejecting it are quite weak. Backporting is not really an issue,
compatibility neither. KIP-848 is still in preview so we could live with a
few misleading logs, if any. In my opinion, the main downside of this
approach is that if you leave after receiving the first HB and the member
id, the server will respond with an unknown member id error because the
member is not really in the group yet. We could think of having a real
lobby as suggested by Andrew. I am not a fan of this because we have to
bookkeep more state on the server. The client side generated member id is
simpler and more reliable overall.

7. Another argument for the client side generated member id is that the
member id will be equivalent to an incarnation id in the sense that it
won't change at all during the lifetime of the client.

8. It seems tricky to validate the uuid on the server because it is just a
string. Could you elaborate on this?

Best,
David

On Wed, Aug 14, 2024 at 9:48 AM TengYao Chi  wrote:

> Hello Andrew
> Thank you for your thoughtful suggestions and getting the discussion going.
>
> To AS1:
> In the current scenario where the server generates the UUID, if the client
> shuts down before receiving the memberId generated by the GC (regardless of
> whether it’s a graceful shutdown or not), the GC will still have to wait
> for the heartbeat timeout because the client doesn’t know its memberId.
> This KIP indeed cannot completely resolve the idempotency issue, but it can
> better handle shutdown scenarios under normal circumstances because the
> client always knows its memberId. Even if the client shuts down immediately
> after the initial heartbeat, as long as it performs a graceful shutdown and
> sends a leave heartbeat, the GC can manage the situation and remove the
> member. Therefore, the goal of this KIP is to address the issue where the
> GC has to wait for the heartbeat timeout due to the client leaving without
> knowing its memberId, which leads to reduced throughput and limited
> scalability.
>
> The solution you suggest has also been proposed by David. The concern with
> this approach is that it introduces additional complexity for
> compatibility, as the new server would not immediately add the member to
> the group, while the old server would. This requires clients to
> differentiate whether their memberId has been added to the group or not,
> which could result in unexpected logs.
>
> Best Regards,
> TengYao
>
> Andrew Schofield  於 2024年8月14日 週三 上午12:29寫道:
>
> > Hi TengYao,
> > Thanks for the KIP. I wonder if there’s a different way to close what
> > is quite a small window.
> >
> > AS1: It is true that the initial heartbeat is not idempotent, but this
> > remains
> > true with this KIP. It’s just differently not idempotent. If the client
> > makes its
> > own member ID, sends a request and dies, the GC will still have added
> > the member to the group and it will hang around until the session
> expires.
> >
> > I wonder if the GC could still generate the member ID in response to the
> > first
> > heartbeat, and put the member in a special PENDING state with no
> > assignments until the client sends the next heartbeat, thus confirming it
> > has received the member ID. This would not be a protocol change at all,
> > just
> > a change to the GC to keep a new member in the lobby until it’s comfirmed
> > it knows its member ID.
> >
> >
> > Thanks,
> > Andrew
> >
> > > On 13 Aug 2024, at 15:59, TengYao Chi  wrote:
> > >
> > > Hi Chia-Ping,
> > >
> > > Thanks for review and suggestions.
> > > I have updated the content of KIP accordingly.
> > > Plea

[jira] [Resolved] (KAFKA-14511) Extend AlterIncrementalConfigs API to support group config

2024-08-12 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14511.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Extend AlterIncrementalConfigs API to support group config
> --
>
> Key: KAFKA-14511
> URL: https://issues.apache.org/jira/browse/KAFKA-14511
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Lan Ding
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17228) Static member using new protocol should always replace the one using the old protocol

2024-08-09 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17228.
-
Fix Version/s: 4.0.0
 Reviewer: David Jacot
   Resolution: Fixed

> Static member using new protocol should always replace the one using the old 
> protocol
> -
>
> Key: KAFKA-17228
> URL: https://issues.apache.org/jira/browse/KAFKA-17228
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 4.0.0
>
>
> {color:#172b4d}In the old protocol, when a static consumer shuts down, it 
> [won't send explicit LeaveGroup 
> request|https://github.com/apache/kafka/blob/010ab19b724ae011e85686ce47320f4f85d9a11f/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L1158-L1164].
>  It's okay because the old protocol replaces the existing member whenever a 
> new member with the same instance id joins.{color}
> {color:#172b4d}However in the new protocol, we [requires the existing member 
> to send leave 
> group|https://github.com/apache/kafka/blob/trunk/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java#L2236-L2238]
>  for a new static member to replace the existing one. The gap causes the 
> upgraded new consumer unable to join the group in both online/offline 
> upgrade.{color}
> {color:#172b4d}We should make the static member using new protocol replace 
> the static member using old protocol regardless of whether the latter has 
> left the group.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17267) New group coordinator can return REQUEST_TIMED_OUT for OFFSET_FETCHes

2024-08-09 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17267.
-
Fix Version/s: 4.0.0
 Reviewer: David Jacot
 Assignee: Sean Quah
   Resolution: Fixed

> New group coordinator can return REQUEST_TIMED_OUT for OFFSET_FETCHes
> -
>
> Key: KAFKA-17267
> URL: https://issues.apache.org/jira/browse/KAFKA-17267
> Project: Kafka
>  Issue Type: Bug
>  Components: group-coordinator
>Reporter: Sean Quah
>Assignee: Sean Quah
>Priority: Minor
> Fix For: 4.0.0
>
>
> Under some circumstances, the new group coordinator can return 
> {{REQUEST_TIMED_OUT}} errors in response to {{OFFSET_FETCH}} requests.
> However, the client (ConsumerCoordinator) does not handle this error code and 
> treats it as non-retryable. For compatibility with older clients, we can map 
> {{REQUEST_TIMED_OUT}} to {{{}NOT_COORDINATOR{}}}.
>  
> Similar to KAFKA-16386.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17298) Update upgrade notes for 4.0

2024-08-07 Thread David Jacot (Jira)
David Jacot created KAFKA-17298:
---

 Summary: Update upgrade notes for 4.0
 Key: KAFKA-17298
 URL: https://issues.apache.org/jira/browse/KAFKA-17298
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Graduation steps for Features

2024-07-31 Thread David Jacot
We could also use a hierarchie: KIP Parent Jira > Milestones Jiras > Tasks.

Best,
David

On Wed, Jul 31, 2024 at 4:22 PM Josep Prat 
wrote:

> Hi David,
>
> One of the problems I see is that the KIP index page has a 1-to-many
> relationship between KIP and release. I guess we might want to turn this to
> a many-to-many qualified relationship. Otherwise it might be complicated
> for the release manager or the KIP driver(s) to keep the Release Plan page
> up-to-date for the different steps.
>
> Another alternative would be to have special sub-tasks in JIRA that would
> indicate the state of the KIP, then using the "fixed version" label they'll
> be included in the release notes and the Release Manager can look for these
> special ones when writing announcements or making sure the release notes
> are up-to-date.
>
> Best,
>
> On Wed, Jul 31, 2024 at 3:54 PM David Jacot 
> wrote:
>
> > Hi Josep,
> >
> > Thanks for starting the discussion.
> >
> > We used Early Access, Preview and GA (or Production Ready) for KIP-848
> and
> > I find it pretty nice. We could add the tentative release plan to the
> KIP's
> > header and it could be used as the source of truth.
> >
> > Best,
> > David
> >
> > On Wed, Jul 31, 2024 at 11:53 AM Josep Prat  >
> > wrote:
> >
> > > Hi Andrew,
> > > I can definitely write a KIP, but before doing so I'd like to gather
> some
> > > feedback from the community around these steps and how they are
> perceived
> > > by different groups of people.
> > >
> > > On Wed, Jul 31, 2024 at 11:50 AM Andrew Schofield <
> > > andrew_schofi...@live.com>
> > > wrote:
> > >
> > > > Hi Josep,
> > > > I think it’s high time that this was tackled. I suggest that it would
> > be
> > > > best
> > > > handled as a KIP because then we have a document which can be
> discussed
> > > > and improved, followed by a formal vote.
> > > >
> > > > A standard set of terms with agreed meanings would be very helpful
> for
> > > > some of the larger KIPs which take many releases to be properly ready
> > for
> > > > prime time. Most KIPs don’t need this, but a handful definitely do.
> > > >
> > > > Personally, I like the sequence that KIP-848 has taken, moving from
> > Early
> > > > Access, to Preview, and finally complete. I intend to follow the same
> > > > sequence
> > > > for KIP-932.
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > > On 31 Jul 2024, at 10:15, Josep Prat 
> > > > wrote:
> > > > >
> > > > > Also as part of this discussion I would like to flag that we need
> to
> > be
> > > > > able to know how we can flag this properly so it's known for the
> > > Release
> > > > > Manager.
> > > > > For example, a KIP is approved, the Jira associated with it is
> being
> > > > worked
> > > > > on. Release happens, Jira is still open, how can we flag that this
> > KIP
> > > is
> > > > > in early access, or preview?
> > > > >
> > > > > Best,
> > > > >
> > > > > On Wed, Jul 31, 2024 at 11:03 AM Josep Prat 
> > > wrote:
> > > > >
> > > > >> Hi Kafka devs,
> > > > >>
> > > > >> Lately we started using "early access", "production ready" and
> also
> > > > >> "preview" to determine the grade of "production readiness" of the
> > > > features
> > > > >> we deliver to our community.
> > > > >> However, as far as I know, there is no official definition from
> the
> > > > Apache
> > > > >> Kafka side on which are the graduation steps for features and what
> > > type
> > > > of
> > > > >> "guarantees" each of these offer.
> > > > >>
> > > > >> I think we should agree on which terms we should use and what each
> > of
> > > > >> these exactly mean in terms of reliability. So far it seems we
> have
> > > this
> > > > >> graduation steps:
> > > > >> - Early Access: Feature is just complete but not yet fully
> polished
> > > and
> > > > >> maybe not used in production in many environm

Re: [DISCUSS] Graduation steps for Features

2024-07-31 Thread David Jacot
Hi Josep,

Thanks for starting the discussion.

We used Early Access, Preview and GA (or Production Ready) for KIP-848 and
I find it pretty nice. We could add the tentative release plan to the KIP's
header and it could be used as the source of truth.

Best,
David

On Wed, Jul 31, 2024 at 11:53 AM Josep Prat 
wrote:

> Hi Andrew,
> I can definitely write a KIP, but before doing so I'd like to gather some
> feedback from the community around these steps and how they are perceived
> by different groups of people.
>
> On Wed, Jul 31, 2024 at 11:50 AM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
> > Hi Josep,
> > I think it’s high time that this was tackled. I suggest that it would be
> > best
> > handled as a KIP because then we have a document which can be discussed
> > and improved, followed by a formal vote.
> >
> > A standard set of terms with agreed meanings would be very helpful for
> > some of the larger KIPs which take many releases to be properly ready for
> > prime time. Most KIPs don’t need this, but a handful definitely do.
> >
> > Personally, I like the sequence that KIP-848 has taken, moving from Early
> > Access, to Preview, and finally complete. I intend to follow the same
> > sequence
> > for KIP-932.
> >
> > Thanks,
> > Andrew
> >
> > > On 31 Jul 2024, at 10:15, Josep Prat 
> > wrote:
> > >
> > > Also as part of this discussion I would like to flag that we need to be
> > > able to know how we can flag this properly so it's known for the
> Release
> > > Manager.
> > > For example, a KIP is approved, the Jira associated with it is being
> > worked
> > > on. Release happens, Jira is still open, how can we flag that this KIP
> is
> > > in early access, or preview?
> > >
> > > Best,
> > >
> > > On Wed, Jul 31, 2024 at 11:03 AM Josep Prat 
> wrote:
> > >
> > >> Hi Kafka devs,
> > >>
> > >> Lately we started using "early access", "production ready" and also
> > >> "preview" to determine the grade of "production readiness" of the
> > features
> > >> we deliver to our community.
> > >> However, as far as I know, there is no official definition from the
> > Apache
> > >> Kafka side on which are the graduation steps for features and what
> type
> > of
> > >> "guarantees" each of these offer.
> > >>
> > >> I think we should agree on which terms we should use and what each of
> > >> these exactly mean in terms of reliability. So far it seems we have
> this
> > >> graduation steps:
> > >> - Early Access: Feature is just complete but not yet fully polished
> and
> > >> maybe not used in production in many environments
> > >> - Preview: Feature was early access before and it underwent at least a
> > >> cycle of improvements and fixes and it's used in some production
> > >> environments maybe
> > >> - Production ready: Feature is officially released and it fulfills the
> > >> expected initial needs
> > >>
> > >> Note that we don't offer any guarantees or SLA/SLO in the classical
> > term.
> > >>
> > >> Is this something we can agree on? What do those terms mean to you? Do
> > we
> > >> need more steps? Or do we need less steps?
> > >>
> > >> Best,
> > >> --
> > >> [image: Aiven] 
> > >>
> > >> *Josep Prat*
> > >> Open Source Engineering Director, *Aiven*
> > >> josep.p...@aiven.io   |   +491715557497
> > >> aiven.io    |
> > >> 
> > >>    <
> > https://twitter.com/aiven_io>
> > >> *Aiven Deutschland GmbH*
> > >> Alexanderufer 3-7, 10117 Berlin
> > >> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > >> Anna Richardson, Kenneth Chen
> > >> Amtsgericht Charlottenburg, HRB 209739 B
> > >>
> > >
> > >
> > > --
> > > [image: Aiven] 
> > >
> > > *Josep Prat*
> > > Open Source Engineering Director, *Aiven*
> > > josep.p...@aiven.io   |   +491715557497
> > > aiven.io    |   <
> > https://www.facebook.com/aivencloud>
> > >     <
> > https://twitter.com/aiven_io>
> > > *Aiven Deutschland GmbH*
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > > Anna Richardson, Kenneth Chen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> >
> >
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> Anna Richardson, Kenneth Chen
> Amtsgericht Charlottenburg, HRB 209739 B
>


[jira] [Resolved] (KAFKA-16944) Range assignor doesn't co-partition with stickiness

2024-07-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16944.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Range assignor doesn't co-partition with stickiness
> ---
>
> Key: KAFKA-16944
> URL: https://issues.apache.org/jira/browse/KAFKA-16944
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.9.0
>
>
> When stickiness is considered during range assignments, it is possible that 
> in certain cases where co-partitioning is guaranteed we fail. 
> An example would be:
> Consider two topics T1, T2 with 3 partitions each and three members A, B, C.
> Let's say the existing assignment (for whatever reason) is:
> {quote}A -> T1P0  ||  B -> T1P1, T2P0, T2P1, T2P2 || C -> T1P2
> {quote}
> Now we trigger a rebalance with the following subscriptions where all members 
> are subscribed to both topics everything else is the same
> {quote}A -> T1, T2 || B -> T1, T2 || C -> T1, T2
> {quote}
> Since all the topics have an equal number of partitions and all the members 
> are subscribed to the same set of topics we would expect co-partitioning 
> right so would we want the final assignment returned to be
> {quote}A -> T1P0, T2P0  ||  B -> T1P1, T2P1 || C -> T1P2, T2P2
> {quote}
> SO currently the client side assignor returns the following but it's because 
> they don't  assign sticky partitions
> {{{}C=[topic1-2, topic2-2], B=[topic1-1, topic2-1], A=[topic1-0, 
> topic2-0]{}}}Our
>  
> Server side assignor returns:
> (The partitions in bold are the sticky partitions)
> {{{}A=MemberAssignment(targetPartitions={topic2=[1], 
> }}\{{{}{*}topic1=[0]{*}{}}}{{{}}), 
> B=MemberAssignment(targetPartitions={{}}}{{{}*topic2=[0]*{}}}{{{}, 
> {{{{{}*topic1=[1]*{}}}{{{}}), 
> C=MemberAssignment(targetPartitions={topic2=[2], {{{{{}*topic1=[2]*{}}}
> *As seen above co-partitioning is expected but not returned.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17058) Extend CoordinatorRuntime to support non-atomic writes

2024-07-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17058.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Extend CoordinatorRuntime to support non-atomic writes
> --
>
> Key: KAFKA-17058
> URL: https://issues.apache.org/jira/browse/KAFKA-17058
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-07-03 Thread David Jacot
Hi Jun, Colin,

Thanks for your replies.

If the FeatureCommand relies on version 0 too, my suggestion does not work.
Omitting the features for old clients as suggested by Colin seems fine for
me. In practice, administrators will usually use a version of
FeatureCommand matching the cluster version so the impact is not too bad
knowing that the first features will be introduced from 3.9 on.

Best,
David

On Tue, Jul 2, 2024 at 2:15 AM Colin McCabe  wrote:

> Hi David,
>
> In the ApiVersionsResponse, we really don't have an easy way of mapping
> finalizedVersion = 1 to "off" in older releases such as 3.7.0. For example,
> if a 3.9.0 broker advertises that it has finalized group.version = 1, that
> will be treated by 3.7.0 as a brand new feature, not as "KIP-848 is off."
> However, I suppose we could work around this by not setting a
> finalizedVersion at all for group.version (or any other feature) if its
> finalized level was 1. We could also work around the "deletion = set to 0"
> issue on the server side. The server can translate requests to set the
> finalized level to 0, into requests to set it to 1.
>
> So maybe this solution is worth considering, although it's unfortunate to
> lose 0. I suppose we'd have to special case metadata.version being set to
> 1, since that was NOT equivalent to it being "off"
>
> best,
> Colin
>
>
> On Mon, Jul 1, 2024, at 10:11, Jun Rao wrote:
> > Hi, David,
> >
> > Yes, that's another option. It probably has its own challenges. For
> > example, the FeatureCommand tool currently treats disabling a feature as
> > setting the version to 0. It would be useful to get Jose's opinion on
> this
> > since he introduced version 0 in the kraft.version feature.
> >
> > Thanks,
> >
> > Jun
> >
> > On Sun, Jun 30, 2024 at 11:48 PM David Jacot  >
> > wrote:
> >
> >> Hi Jun, Colin,
> >>
> >> Have we considered sticking with the range going from version 1 to N
> where
> >> version 1 would be the equivalent of "disabled"? In the group.version
> case,
> >> we could introduce group.version=1 that does basically nothing and
> >> group.version=2 that enables the new protocol. I suppose that we could
> do
> >> the same for the other features. I agree that it is less elegant but it
> >> would avoid all the backward compatibility issues.
> >>
> >> Best,
> >> David
> >>
> >> On Fri, Jun 28, 2024 at 6:02 PM Jun Rao 
> wrote:
> >>
> >> > Hi, Colin,
> >> >
> >> > Yes, #3 is the scenario that I was thinking about.
> >> >
> >> > In either approach, there will be some information missing in the old
> >> > client. It seems that we should just pick the one that's less wrong.
> In
> >> the
> >> > more common case when a feature is finalized on the server,
> presenting a
> >> > supported feature with a range of 1-1 seems less wrong than omitting
> it
> >> in
> >> > the output of "kafka-features describe".
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Thu, Jun 27, 2024 at 9:52 PM Colin McCabe 
> wrote:
> >> >
> >> > > Hi Jun,
> >> > >
> >> > > This is a fair question. I think there's a few different scenarios
> to
> >> > > consider:
> >> > >
> >> > > 1. mixed server software versions in a single cluster
> >> > >
> >> > > 2. new client software + old server software
> >> > >
> >> > > 3. old client software + new server software
> >> > >
> >> > > In scenario #1 and #2, we have old (pre-3.9) server software in the
> >> mix.
> >> > > This old software won't support features like group.version and
> >> > > kraft.version. As we know, there are no features supported in 3.8
> and
> >> > older
> >> > > except metadata.version itself. So the fact that we leave out some
> >> stuff
> >> > > from the ApiVersionResponse isn't terribly significant. We weren't
> >> going
> >> > to
> >> > > be able to enable those post-3.8 features anyway, since enabling a
> >> > feature
> >> > > requires ALL server nodes to support it.
> >> > >
> >> > > Scenario #3 is more interesting. With new server software, features
> >> like
> >> > 

[jira] [Resolved] (KAFKA-17047) Refactor Consumer group and shared classes with Share to modern package

2024-07-03 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17047.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Refactor Consumer group and shared classes with Share to modern package
> ---
>
> Key: KAFKA-17047
> URL: https://issues.apache.org/jira/browse/KAFKA-17047
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apoorv Mittal
>Assignee: Apoorv Mittal
>Priority: Major
> Fix For: 3.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-17050) Revert group.version for 3.8 and 3.9

2024-07-02 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-17050.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Revert group.version for 3.8 and 3.9
> 
>
> Key: KAFKA-17050
> URL: https://issues.apache.org/jira/browse/KAFKA-17050
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.8.0, 3.9.0
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Major
> Fix For: 3.8.0
>
>
> After much discussion for KAFKA-17011, we decided it would be best for 3.8 to 
> just remove the group version feature for 3.8. 
> As for 3.9, [~dajac] said it would be easier for EA users of the group 
> coordinator to have a single way to configure. For 4.0 we can reintroduce it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1062: Introduce Pagination for some requests used by Admin API

2024-07-02 Thread David Jacot
Hi Omnia,

Thanks for the KIP. I agree that we should migrate admin APIs to the new
pattern.

DJ1: Why do we want to migrate only a subset of the APIs vs migrating all
of them? For instance, there are DescribeGroups, ConsumerGroupDescribe,
etc. Do we have reasons not to migrate them too? I think that it would be
great to have a KIP that establishes the pattern for all the admin APIs.

DJ2: I am not a fan of all the new parameters passed to the tools
(e.g. --partition-size-limit-per-response). I wonder if we could rather
have a config in the admin client to set the default page size for all
requests.

DJ3: I assume that the Admin client will transparently handle the change.
It would be great to call it out in the compatibility section.

Thanks,
David

On Tue, Jul 2, 2024 at 11:17 AM Andrew Schofield 
wrote:

> Hi,
> Thanks for the response. Makes sense to me. Just one additional comment:
>
> AS5: The cursor for ListGroupsResponse is called `TransactionalCursor`
> which
> seems like a copy-paste mistake.
>
> Thanks,
> Andrew
>
> > On 30 Jun 2024, at 22:28, Omnia Ibrahim  wrote:
> >
> > Hi Andrew thanks for having a look into the KIP
> >
> >> AS1: Besides topics, the most numerous resources in Kafka clusters in
> my experience
> >> are consumer groups. Would it be possible to extend the KIP to cover
> ListGroups while
> >> you’re in here? I’ve heard of clusters with truly vast numbers of
> groups. This is also
> >> potentially a sign of a misbehaving or poorly written clients. Getting
> a page of groups
> >> with a massive ItemsLeftToFetch would be nice.
> > Yes, I also had few experiences with large cluster where to list
> consumer groups can take up to 5min. I update the KIP to include this as
> well.
> >
> >> AS2: A tiny nit: The versions for the added fields are incorrect in
> some cases.
> > I believe I fixed all of them now
> >
> >> AS3: I don’t quite understand the cursor for
> OffsetFetchRequest/Response.
> >> It looks like the cursor is (topic, partition), but not group ID. Does
> the cursor
> >> apply to all groups in the request, or is group ID missing?
> >
> > I was thinking that the last one in the response will be the one that
> has the cursor while the rest will have null. But if we are moving
> NextCursour to the top level of the response then the cursor will need
> groupID.
> >> AS4: For the remaining request/response pairs, the cursor makes sense
> to me,
> >> but I do wonder whether `NextCursor` should be at the top level of the
> responses
> >> instead, like DescribeTopicPartitionsResponse.
> >
> > Updates the KIP to reflect this now.
> >
> > Let me know if you have any more feedback on this.
> >
> > Best
> > Omnia
> >
> >> On 27 Jun 2024, at 17:53, Andrew Schofield 
> wrote:
> >>
> >> Hi Omnia,
> >> Thanks for the KIP. This is a really nice improvement for administering
> large clusters.
> >>
> >> AS1: Besides topics, the most numerous resources in Kafka clusters in
> my experience
> >> are consumer groups. Would it be possible to extend the KIP to cover
> ListGroups while
> >> you’re in here? I’ve heard of clusters with truly vast numbers of
> groups. This is also
> >> potentially a sign of a misbehaving or poorly written clients. Getting
> a page of groups
> >> with a massive ItemsLeftToFetch would be nice.
> >>
> >> AS2: A tiny nit: The versions for the added fields are incorrect in
> some cases.
> >>
> >> AS3: I don’t quite understand the cursor for
> OffsetFetchRequest/Response.
> >> It looks like the cursor is (topic, partition), but not group ID. Does
> the cursor
> >> apply to all groups in the request, or is group ID missing?
> >>
> >> AS4: For the remaining request/response pairs, the cursor makes sense
> to me,
> >> but I do wonder whether `NextCursor` should be at the top level of the
> responses
> >> instead, like DescribeTopicPartitionsResponse.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 27 Jun 2024, at 14:05, Omnia Ibrahim 
> wrote:
> >>>
> >>> Hi everyone, I would like to start a discussion thread for KIP-1062
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1062%3A+Introduce+Pagination+for+some+requests+used+by+Admin+API
> >>>
> >>>
> >>> Thanks
> >>> Omnia
>
>
>


[jira] [Created] (KAFKA-17058) Extend CoordinatorRuntime to support non-atomic writes

2024-07-01 Thread David Jacot (Jira)
David Jacot created KAFKA-17058:
---

 Summary: Extend CoordinatorRuntime to support non-atomic writes
 Key: KAFKA-17058
 URL: https://issues.apache.org/jira/browse/KAFKA-17058
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-06-30 Thread David Jacot
ne Olshan wrote:
> > >>> >> > > > > Thanks Colin,
> > >>> >> > > > >
> > >>> >> > > > > This makes sense to me. Namely in the case where we
> perhaps
> > >>> don't
> > >>> >> > want to
> > >>> >> > > > > support version 0 anymore, we need the range to be able to
> > not
> > >>> >> > include 0.
> > >>> >> > > > > (In other words, we can't assume 0 is supported)
> > >>> >> > > > > It is unfortunate that this change is a bit tricky, but I
> > think
> > >>> >> it's
> > >>> >> > the
> > >>> >> > > > > best option.
> > >>> >> > > > >
> > >>> >> > > > > Can you clarify
> > >>> >> > > > >> The server will simply leave out the features whose
> minimum
> > >>> >> > supported
> > >>> >> > > > > value is 0 for clients that send v3
> > >>> >> > > > >
> > >>> >> > > > > For 3.8, I planned to set the 0s in the response to 1. Is
> it
> > >>> better
> > >>> >> > to
> > >>> >> > > > > suppress the zero version features in the response so we
> are
> > >>> >> > consistent
> > >>> >> > > > > between trunk and 3.8?
> > >>> >> > > > >
> > >>> >> > > > > Thanks,
> > >>> >> > > > > Justine
> > >>> >> > > > >
> > >>> >> > > > > On Fri, Jun 21, 2024 at 4:34 PM Colin McCabe <
> > >>> cmcc...@apache.org>
> > >>> >> > wrote:
> > >>> >> > > > >
> > >>> >> > > > >> Hi all,
> > >>> >> > > > >>
> > >>> >> > > > >> It seems that there was a bug in older versions of Kafka
> > which
> > >>> >> > caused
> > >>> >> > > > >> deserialization problems when a supported feature range
> > >>> included
> > >>> >> 0.
> > >>> >> > For
> > >>> >> > > > >> example, the range for group.version of [0, 1] would be a
> > >>> problem
> > >>> >> in
> > >>> >> > > > this
> > >>> >> > > > >> situation.
> > >>> >> > > > >>
> > >>> >> > > > >> This obviously makes supportedVersions kind of useless.
> Any
> > >>> >> feature
> > >>> >> > that
> > >>> >> > > > >> doesn't exist today is effectively at v0 today (v0 is
> > >>> equivalent
> > >>> >> to
> > >>> >> > > > "off").
> > >>> >> > > > >> But if we can't declare that the server supports [0, 1]
> or
> > >>> >> similar,
> > >>> >> > we
> > >>> >> > > > >> can't declare that it supports the feature being off.
> > >>> Therefore,
> > >>> >> no
> > >>> >> > > > rolling
> > >>> >> > > > >> upgrades are possible.
> > >>> >> > > > >>
> > >>> >> > > > >> We noticed this bug during the 3.8 release when we
> noticed
> > >>> >> problems
> > >>> >> > in
> > >>> >> > > > >> upgrade tests. As an addendum to KIP-1022, we're adding
> the
> > >>> >> > following
> > >>> >> > > > >> solution:
> > >>> >> > > > >>
> > >>> >> > > > >> - There will be a new v4 for ApiVersionsRequest
> > >>> >> > > > >>
> > >>> >> > > > >> - Clients that sent v4 will promise to correctly handle
> > ranges
> > >>> >> that
> > >>> >> > > > start
> > >>> >> > > > >> with 0, such as [0, 1]
> > >>> >> > > > >>
> > >>> >> > > > >> - The server will simply leave out the features whose
> > minimum
> > >>> >> > supported
> > >>> >> > > > >> value is 0 for clients that send v3
> > >>> >> > > > >>
> > >>> >> > > > >> - ApiVersionsRequest v4 will be supported in AK 3.9 and
> > >>> above. AK
> > >>> >> > 3.8
> > >>> >> > > > will
> > >>> >> > > > >> ship with ApiVersionsRequest v3 just as today.
> > >>> >> > > > >>
> > >>> >> > > > >> thanks,
> > >>> >> > > > >> Colin
> > >>> >> > > > >>
> > >>> >> > > > >>
> > >>> >> > > > >> On Mon, Apr 15, 2024, at 11:01, Justine Olshan wrote:
> > >>> >> > > > >> > Hey folks,
> > >>> >> > > > >> >
> > >>> >> > > > >> > Thanks everyone! I will go ahead and call it.
> > >>> >> > > > >> > The KIP passes with the following +1 votes:
> > >>> >> > > > >> >
> > >>> >> > > > >> > - Andrew Schofield (non-binding)
> > >>> >> > > > >> > - David Jacot (binding)
> > >>> >> > > > >> > - José Armando García Sancio (binding)
> > >>> >> > > > >> > - Jun Rao (binding)
> > >>> >> > > > >> >
> > >>> >> > > > >> > Thanks again,
> > >>> >> > > > >> > Justine
> > >>> >> > > > >> >
> > >>> >> > > > >> > On Fri, Apr 12, 2024 at 11:16 AM Jun Rao
> > >>> >>  > >>> >> > >
> > >>> >> > > > >> wrote:
> > >>> >> > > > >> >
> > >>> >> > > > >> >> Hi, Justine,
> > >>> >> > > > >> >>
> > >>> >> > > > >> >> Thanks for the KIP. +1
> > >>> >> > > > >> >>
> > >>> >> > > > >> >> Jun
> > >>> >> > > > >> >>
> > >>> >> > > > >> >> On Wed, Apr 10, 2024 at 9:13 AM José Armando García
> > Sancio
> > >>> >> > > > >> >>  wrote:
> > >>> >> > > > >> >>
> > >>> >> > > > >> >> > Hi Justine,
> > >>> >> > > > >> >> >
> > >>> >> > > > >> >> > +1 (binding)
> > >>> >> > > > >> >> >
> > >>> >> > > > >> >> > Thanks for the improvement.
> > >>> >> > > > >> >> > --
> > >>> >> > > > >> >> > -José
> > >>> >> > > > >> >> >
> > >>> >> > > > >> >>
> > >>> >> > > > >>
> > >>> >> > > >
> > >>> >> >
> > >>> >> >
> > >>> >> >
> > >>> >> > --
> > >>> >> > -José
> > >>> >> >
> > >>> >>
> > >>>
> > >>
> >
>


Re: [DISCUSS] KIP-1014: Managing Unstable Features in Apache Kafka

2024-06-28 Thread David Jacot
Hi Colin,

I agree that we try hard to avoid breaking compatibility. I am not
questioning this at all. I also agree with your concern.

My point was about requiring users to opt-in for a feature vs having it
enabled by default during the EA and the Preview phases. With KIP-1022, the
only way to use a non-default version of a feature is to set the unstable
config. I think that it would be nice to have the default feature version,
new stable version(s) that users could opt-in to enable during EA/Preview,
and the development one(s) gated by the unstable flag. This is perhaps more
a discussion for KIP-1022.

Best,
David

On Fri, Jun 28, 2024 at 7:50 AM Chia-Ping Tsai  wrote:

> On Wed, Jun 26, 2024, at 13:16, Jun Rao wrote:
>
> > > Hi, Colin,
> > >
> > > Thanks for the reply.
> > >
> > > 1.
> >
> https://kafka.apache.org/protocol.html#The_Messages_ConsumerGroupDescribe
> > > lists ConsumerGroupDescribeRequest, whose latest version is unstable.
> > >
> >
> > Hi Jun,
> >
> > I think that is a bug.
> >
>
> I file a jira for this (https://issues.apache.org/jira/browse/KAFKA-17051)
>


[jira] [Resolved] (KAFKA-16822) Abstract consumer group in coordinator to share functionality with share group

2024-06-27 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16822.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Abstract consumer group in coordinator to share functionality with share group
> --
>
> Key: KAFKA-16822
> URL: https://issues.apache.org/jira/browse/KAFKA-16822
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Apoorv Mittal
>Assignee: Apoorv Mittal
>Priority: Major
> Fix For: 3.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1014: Managing Unstable Features in Apache Kafka

2024-06-27 Thread David Jacot
Hi all,

I think that it would be nice to have an official way to enable
non-production-ready features in order to have a way to test them in
development/soak clusters. For instance, I would like the new consumer
protocol to be disabled by default but users should be able to enable it if
they want to test it in their environment. Diverging a bit here but for
group.version=1, it should be the default in 4.0 and it should be usable in
3.8/3.9 by for instance setting it with `kafka-features`. I don't think
that we allow this with KIP-1022 unless the unstable/unrelease flag is set
because group.version=1 is attached to MV_4_0.

Best,
David

On Thu, Jun 27, 2024 at 12:50 AM Colin McCabe  wrote:

> Hi Jun,
>
> KIP-1014 is explicitly NOT for EA features, since EA features need to be
> usable by non-developers.
>
> I think it's important to be clear about this. Maybe "unreleased" would be
> a better name than "unstable" since people seem to have lots of different
> ideas about what "unstable" means, which are very different than the
> intention of this KIP. What do you think?
>
> For a developer making changes to the code, making an additional change to
> enable testing outside of JUnit should be fine. I think this is actually
> critical to making this work, since if we allow non-developers to use
> KIP-1014 features, we'll get bogged down in compatibility discussions (no
> matter what the KIP says).
>
> best,
> Colin
>
> On Wed, Jun 26, 2024, at 14:49, Jun Rao wrote:
> > Hi, Colin,
> >
> > Thanks for the reply.
> >
> > 4. "A developer could modify the code to allow unstable features outside
> of
> > JUnit, and then run whatever they want."
> > Hmm, it's inconvenient for a developer to make some temporary change just
> > to test an unstable feature outside of junit, right?
> >
> > Also, how does a user test an EA feature in a release? It's inconvenient
> > for a user to change code and recompile the binary.
> >
> > Jun
> >
> >
> > On Wed, Jun 26, 2024 at 1:38 PM Colin McCabe  wrote:
> >
> >> On Wed, Jun 26, 2024, at 13:16, Jun Rao wrote:
> >> > Hi, Colin,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > 1.
> >>
> https://kafka.apache.org/protocol.html#The_Messages_ConsumerGroupDescribe
> >> > lists ConsumerGroupDescribeRequest, whose latest version is unstable.
> >> >
> >>
> >> Hi Jun,
> >>
> >> I think that is a bug.
> >>
> >> >
> >> > 4. "As devlopers, they can change the code to do this if they want."
> >> > Just to be clear. A developer could be able to test unstable MV/RPCs
> by
> >> > enabling unstable.features.enable in a real cluster, right?
> >>
> >> A developer could modify the code to allow unstable features outside of
> >> JUnit, and then run whatever they want.
> >>
> >> >
> >> > "But I think it's important that this should NOT work in our actual
> Kafka
> >> > releases"
> >> > Are you saying unstable MV/RPCs can't be enabled in Kafka releases
> with
> >> > unstable.features.enable set to true? How do we plan to enforce that?
> >> >
> >>
> >> We can just unset the configuration key in KafkaRaftServer.scala, which
> is
> >> not used by JUnit, but which is used by the normal broker and controller
> >> startup processes.
> >>
> >> best,
> >> Colin
> >>
> >> > Jun
> >> >
> >> > On Wed, Jun 26, 2024 at 12:52 PM Colin McCabe 
> >> wrote:
> >> >
> >> >> On Wed, Jun 26, 2024, at 12:09, Jun Rao wrote:
> >> >> > Hi, Colin,
> >> >> >
> >> >> > Thanks for restarting the discussion. A few comments.
> >> >> >
> >> >> > 1. "An unstable RPC version can be changed at any time, until it
> >> becomes
> >> >> > stable."
> >> >> >
> >> >> > What's our recommendation to non-java developers? Should they start
> >> >> > building a new version of an RPC until it is stable?
> >> >> >
> >> >>
> >> >> Hi Jun,
> >> >>
> >> >> Non-Java developers will always be using only stable APIs. Unstable
> APIs
> >> >> are only available to JUnit tests (that run inside the JUnit JVM).
> >> >>
> >> >> > Should we explicitly mark unstable versions of PRC in
> >> >> > https://kafka.apache.org/protocol.html? Currently, it's not clear
> >> which
> >> >> > versions are unstable.
> >> >> >
> >> >>
> >> >> Hmm, I don't think the unstable APIs should be documented at all in
> our
> >> >> public docs. Since they're just "possibilities for the future" that
> >> haven't
> >> >> actually been released.
> >> >>
> >> >> > 2. enable.unstable.features: Our current convention is to put
> enable
> >> in
> >> >> the
> >> >> > suffix in config names.
> >> >> >
> >> >>
> >> >> OK. I changed it to "unstable.features.enable"
> >> >>
> >> >> > 3. It would be useful to explicitly mention the removal of the
> >> following
> >> >> > two configs in the public interfaces section.
> >> >> > unstable.api.versions.enable
> >> >> > unstable.feature.versions.enable
> >> >> >
> >> >>
> >> >> OK. I added this to that section.
> >> >>
> >> >> > 4. "Clusters can be created with unstable MVs, but only in JUnit
> >> tests."
> >> >> > Hmm, we should allow developers to test unstabl

[jira] [Resolved] (KAFKA-16973) Fix caught-up condition

2024-06-20 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16973.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Fix caught-up condition
> ---
>
> Key: KAFKA-16973
> URL: https://issues.apache.org/jira/browse/KAFKA-16973
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.9.0
>
>
> When a write operation does not have any records, the coordinator runtime 
> checked whether the state machine is caught-up to decide whether the 
> operation should wait until the state machine is committed up to the 
> operation point or the operation should be completed. The current 
> implementation assumes that there will always be a pending write operation 
> waiting in the deferred queue when the state machine is not fully caught-up 
> yet. This is true except when the state machine is just loaded and not 
> caught-up yet. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-17 Thread David Jacot
I meant it from a time perspective, not from a branching point perspective.
Sorry for the confusion. As said in the other thread, doing it four months
after 3.9 is desirable for KIP-848 as I expect that we will need time to
stabilize everything after switching all the default configs once 3.9 is
cut.

David

Le lun. 17 juin 2024 à 19:33, Matthias J. Sax  a écrit :

> Why would 4.0 be based on 3.8? My understanding is, that it will be
> based on 3.9.
>
> -Matthias
>
> On 6/14/24 11:22 PM, David Jacot wrote:
> > I agree that we should keep 4.0 time-based. My question is based on which
> > release. If I understand you, you would like to keep it based on 3.8.
> This
> > means that 4.0 would be released in October. It would be helpful to fix
> the
> > dates so we can plan accordingly. I will start a separate thread on
> Monday.
> >
> > David
> >
> > Le sam. 15 juin 2024 à 00:44, Colin McCabe  a écrit
> :
> >
> >> +1. I think it would be good to keep 4.0 time-based. Most of the
> refactors
> >> we want to do are optional in some sense and can be descoped if time
> runs
> >> out. For example, we can drop support for JDK 8 without immediately
> >> refactoring everything that could benefit from the improvements in
> JDK9+.
> >>
> >> best,
> >> Colin
> >>
> >>
> >> On Fri, Jun 14, 2024, at 15:37, Matthias J. Sax wrote:
> >>> That's my understanding, and I would advocate strongly to keep the 4.0
> >>> release schedule as planed originally.
> >>>
> >>> The 3.9 one should really be an additional "out of schedule" release
> not
> >>> impacting any other releases.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 6/14/24 1:29 PM, David Jacot wrote:
> >>>> The plan sounds good to me. I suppose that we will follow our regular
> >>>> cadence for 4.0 and release it four months after 3.9 (in November?).
> Is
> >>>> this correct?
> >>>>
> >>>> Best,
> >>>> David
> >>>>
> >>>> Le ven. 14 juin 2024 à 21:57, José Armando García Sancio
> >>>>  a écrit :
> >>>>
> >>>>> +1 on the proposed release plan for 3.8.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> On Fri, Jun 14, 2024 at 3:33 PM Ismael Juma 
> wrote:
> >>>>>>
> >>>>>> +1 to the plan we converged on in this thread.
> >>>>>>
> >>>>>> Ismael
> >>>>>>
> >>>>>> On Fri, Jun 14, 2024 at 10:46 AM Josep Prat
> >>  >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> Thanks Colin, yes go ahead.
> >>>>>>>
> >>>>>>> As we are now past code freeze I would like to ask everyone
> involved
> >>>>> in a
> >>>>>>> KIP that is not yet complete, to verify if what landed on the 3.8
> >>>>> branch
> >>>>>>> needs to be reverted or if it can stay. Additionally, I'll be
> pinging
> >>>>> KIPs
> >>>>>>> and Jira reporters asking for their status as some Jiras seem to
> have
> >>>>> all
> >>>>>>> related GitHub PRs merged but their status is still Open or In
> >>>>> Progress.
> >>>>>>> I'll be checking all the open blockers and check if they are
> really a
> >>>>>>> blocker or can be pushed.
> >>>>>>>
> >>>>>>>
> >>>>>>> Regarding timeline, I'll attempt to generate the first RC on
> >> Wednesday
> >>>>> or
> >>>>>>> Thursday, so please revert any changes
> <https://www.google.com/maps/search/,+so+please+revert+any+changes+?entry=gmail&source=g>you
> deem necessary by then. If
> >>>>> you
> >>>>>>> need more time, please ping me.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> -
> >>>>>>> Josep Prat
> >>>>>>> Open Source Engineering Director, Aiven
> >>>>>>> josep.p...@aiven.io   |   +491715557497 | aiven.io
>

Re: [DISCUSS] Apache Kafka 3.9.0 release

2024-06-17 Thread David Jacot
+1 for the release plan. Thanks!

+1 for releasing 4.0 four months after 3.9. 4.0 is actually a pretty big
release as we will GA KIP-848, including new group coordinator and new
consumer rebalance protocol. This is a pretty big change :).

Best,
David

Le lun. 17 juin 2024 à 20:36, Colin McCabe  a écrit :

> Hi all,
>
> Thanks, everyone.
>
> Quick update: on the release plan page, I moved feature freeze forward and
> code freeze by one week to make sure we can hit that. No other dates
> changed.
>
> With regard to 4.0, I was assuming that we'd do it 4 months after 3.9 was
> released. I think for that release, we should front-load all the "remove
> deprecated stuff" work that has piled up, and try to really hit all the
> release milestone dates. I don't think 4.0 needs to be a very ambitious
> release, it really is mostly about the removals and the new JDK (which are
> a big deal, but I hope should fit within the normal timeframes). That being
> said, if people want a shorter 4.0, I'm open to that, as long as we're
> confident we could actually do that :)
>
> best,
> Colin
>
>
> On Fri, Jun 14, 2024, at 20:56, Sophie Blee-Goldman wrote:
> > +1, thank you Colin
> >
> > Given the July freeze deadlines, I take it we are going with the "short
> > 3.9.0 release" option and that the existence of this release will impact
> > the 4.0.0 deadlines which will still follow the usual schedule -- in
> other
> > words, this is an "additional release" outside of the regular timeline.
> Is
> > this understanding correct?
> >
> > On Fri, Jun 14, 2024 at 5:57 PM Chia-Ping Tsai 
> wrote:
> >
> >> +1 thanks Colin!
> >>
>


[jira] [Resolved] (KAFKA-16673) Optimize toTopicPartitions with ConsumerProtocolSubscription

2024-06-17 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16673.
-
Fix Version/s: 3.9.0
   Resolution: Fixed

> Optimize toTopicPartitions with ConsumerProtocolSubscription
> 
>
> Key: KAFKA-16673
> URL: https://issues.apache.org/jira/browse/KAFKA-16673
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.9.0
>
>
> https://github.com/apache/kafka/pull/15798#discussion_r1582981154



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16973) Fix caught-up condition

2024-06-17 Thread David Jacot (Jira)
David Jacot created KAFKA-16973:
---

 Summary: Fix caught-up condition
 Key: KAFKA-16973
 URL: https://issues.apache.org/jira/browse/KAFKA-16973
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot


When a write operation does not have any records, the coordinator runtime 
checked whether the state machine is caught-up to decide whether the operation 
should wait until the state machine is committed up to the operation point or 
the operation should be completed. The current implementation assumes that 
there will always be a pending write operation waiting in the deferred queue 
when the state machine is not fully caught-up yet. This is true except when the 
state machine is just loaded and not caught-up yet. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-14 Thread David Jacot
I agree that we should keep 4.0 time-based. My question is based on which
release. If I understand you, you would like to keep it based on 3.8. This
means that 4.0 would be released in October. It would be helpful to fix the
dates so we can plan accordingly. I will start a separate thread on Monday.

David

Le sam. 15 juin 2024 à 00:44, Colin McCabe  a écrit :

> +1. I think it would be good to keep 4.0 time-based. Most of the refactors
> we want to do are optional in some sense and can be descoped if time runs
> out. For example, we can drop support for JDK 8 without immediately
> refactoring everything that could benefit from the improvements in JDK9+.
>
> best,
> Colin
>
>
> On Fri, Jun 14, 2024, at 15:37, Matthias J. Sax wrote:
> > That's my understanding, and I would advocate strongly to keep the 4.0
> > release schedule as planed originally.
> >
> > The 3.9 one should really be an additional "out of schedule" release not
> > impacting any other releases.
> >
> >
> > -Matthias
> >
> > On 6/14/24 1:29 PM, David Jacot wrote:
> >> The plan sounds good to me. I suppose that we will follow our regular
> >> cadence for 4.0 and release it four months after 3.9 (in November?). Is
> >> this correct?
> >>
> >> Best,
> >> David
> >>
> >> Le ven. 14 juin 2024 à 21:57, José Armando García Sancio
> >>  a écrit :
> >>
> >>> +1 on the proposed release plan for 3.8.
> >>>
> >>> Thanks!
> >>>
> >>> On Fri, Jun 14, 2024 at 3:33 PM Ismael Juma  wrote:
> >>>>
> >>>> +1 to the plan we converged on in this thread.
> >>>>
> >>>> Ismael
> >>>>
> >>>> On Fri, Jun 14, 2024 at 10:46 AM Josep Prat
>  >>>>
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Thanks Colin, yes go ahead.
> >>>>>
> >>>>> As we are now past code freeze I would like to ask everyone involved
> >>> in a
> >>>>> KIP that is not yet complete, to verify if what landed on the 3.8
> >>> branch
> >>>>> needs to be reverted or if it can stay. Additionally, I'll be pinging
> >>> KIPs
> >>>>> and Jira reporters asking for their status as some Jiras seem to have
> >>> all
> >>>>> related GitHub PRs merged but their status is still Open or In
> >>> Progress.
> >>>>> I'll be checking all the open blockers and check if they are really a
> >>>>> blocker or can be pushed.
> >>>>>
> >>>>>
> >>>>> Regarding timeline, I'll attempt to generate the first RC on
> Wednesday
> >>> or
> >>>>> Thursday, so please revert any changes you deem necessary by then. If
> >>> you
> >>>>> need more time, please ping me.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> -
> >>>>> Josep Prat
> >>>>> Open Source Engineering Director, Aiven
> >>>>> josep.p...@aiven.io   |   +491715557497 | aiven.io
> >>>>> Aiven Deutschland GmbH
> >>>>> Alexanderufer 3-7, 10117 Berlin
> >>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>>
> >>>>> On Fri, Jun 14, 2024, 19:25 Colin McCabe  wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> We have had many delays with releases this year. We should try to
> get
> >>>>> back
> >>>>>> on schedule.
> >>>>>>
> >>>>>> I agree with the idea that was proposed a few times in this thread
> of
> >>>>>> drawing a line under 3.8 now, and doing a short 3.9 release. I
> >>> posted a
> >>>>> 3.9
> >>>>>> release plan here:
> >>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.9.0
> >>>>>>
> >>>>>> I think we could start doing RCs for 3.8.0 as early as next week.
> >>> There
> >>>>>> are a few things that need to be reverted first (anything related to
> >>>>>> KIP-853 or KIP-966).
> >>>>>>
> >>>>>> 

Re: [DISCUSS] Apache Kafka 3.8.0 release

2024-06-14 Thread David Jacot
The plan sounds good to me. I suppose that we will follow our regular
cadence for 4.0 and release it four months after 3.9 (in November?). Is
this correct?

Best,
David

Le ven. 14 juin 2024 à 21:57, José Armando García Sancio
 a écrit :

> +1 on the proposed release plan for 3.8.
>
> Thanks!
>
> On Fri, Jun 14, 2024 at 3:33 PM Ismael Juma  wrote:
> >
> > +1 to the plan we converged on in this thread.
> >
> > Ismael
> >
> > On Fri, Jun 14, 2024 at 10:46 AM Josep Prat  >
> > wrote:
> >
> > > Hi all,
> > >
> > > Thanks Colin, yes go ahead.
> > >
> > > As we are now past code freeze I would like to ask everyone involved
> in a
> > > KIP that is not yet complete, to verify if what landed on the 3.8
> branch
> > > needs to be reverted or if it can stay. Additionally, I'll be pinging
> KIPs
> > > and Jira reporters asking for their status as some Jiras seem to have
> all
> > > related GitHub PRs merged but their status is still Open or In
> Progress.
> > > I'll be checking all the open blockers and check if they are really a
> > > blocker or can be pushed.
> > >
> > >
> > > Regarding timeline, I'll attempt to generate the first RC on Wednesday
> or
> > > Thursday, so please revert any changes you deem necessary by then. If
> you
> > > need more time, please ping me.
> > >
> > > Best,
> > >
> > > -
> > > Josep Prat
> > > Open Source Engineering Director, Aiven
> > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > Aiven Deutschland GmbH
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> > > On Fri, Jun 14, 2024, 19:25 Colin McCabe  wrote:
> > >
> > > > Hi all,
> > > >
> > > > We have had many delays with releases this year. We should try to get
> > > back
> > > > on schedule.
> > > >
> > > > I agree with the idea that was proposed a few times in this thread of
> > > > drawing a line under 3.8 now, and doing a short 3.9 release. I
> posted a
> > > 3.9
> > > > release plan here:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.9.0
> > > >
> > > > I think we could start doing RCs for 3.8.0 as early as next week.
> There
> > > > are a few things that need to be reverted first (anything related to
> > > > KIP-853 or KIP-966).
> > > >
> > > > Josep, if you agree, I will update KIP-1012 to reflect that these
> things
> > > > are landing in 3.9 rather than 3.8. And we can start doing all the
> normal
> > > > release stuff. The main blocker JIRA I'm aware of is KAFKA-16946,
> which
> > > is
> > > > a very simple fix.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jun 14, 2024, at 03:48, Satish Duggana wrote:
> > > > > +1 on going with 3.8 release with the existing plan and targeting
> the
> > > > > required features in 3.9 timelines. 4.0 will be targeted in the
> usual
> > > > > cycle(4 months) after 3.9 is released.
> > > > >
> > > > >
> > > > > On Fri, 14 Jun 2024 at 15:19, Edoardo Comar  >
> > > > wrote:
> > > > >>
> > > > >> Josep,
> > > > >> past the deadline sorry but I can't see reasons not to cherry-pick
> > > this
> > > > >> https://github.com/apache/kafka/pull/16326
> > > > >>
> > > > >> On Wed, 12 Jun 2024 at 17:14, Josep Prat
>  > > >
> > > > wrote:
> > > > >> >
> > > > >> > Hi Edoardo,
> > > > >> >
> > > > >> > Correct, you can still cherry-pick this one.
> > > > >> >
> > > > >> > I'll send an email tomorrow morning (CEST) asking maintainers to
> > > stop
> > > > >> > cherry picking commits unless we discuss it beforehand.
> > > > >> >
> > > > >> > Best,
> > > > >> >
> > > > >> > On Wed, Jun 12, 2024 at 6:09 PM Edoardo Comar <
> > > edoardli...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > Hi Josep, I understand I am still in time to cherry-pick on
> 3.8.0
> > > > >> > > https://github.com/apache/kafka/pull/16229
> > > > >> > >
> > > > >> > > right?
> > > > >> > > thanks
> > > > >> > >
> > > > >> > > On Wed, 12 Jun 2024 at 11:34, Ivan Yurchenko 
> > > > wrote:
> > > > >> > > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > I'll try to do all the fixes and changes for KIP-899 [1]
> sooner
> > > > today,
> > > > >> > > but please proceed with the release if I don't manage.
> > > > >> > > >
> > > > >> > > > Ivan
> > > > >> > > >
> > > > >> > > > [1] https://github.com/apache/kafka/pull/13277
> > > > >> > > >
> > > > >> > > > On Wed, Jun 12, 2024, at 12:54, Josep Prat wrote:
> > > > >> > > > > Hi Luke,
> > > > >> > > > > I think Jose, also mentioned that it won't be ready for
> v3.8.0
> > > > (but he
> > > > >> > > can
> > > > >> > > > > confirm this). My question now would be, given that it
> seems
> > > we
> > > > would
> > > > >> > > need
> > > > >> > > > > a v3.9.0, do you think it's important to include
> > > > >> > > > > https://github.com/apache/kafka/pull/16284 in v3.8.0?
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > >
> > > > >> > > > > On Wed, Jun 12, 2024 at 11:40 AM Luke Chen <
> show...@gmail.com
> > > >

[jira] [Resolved] (KAFKA-16317) Add event rate in GroupCoordinatorRuntimeMetrics

2024-06-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16317.
-
Resolution: Won't Do

> Add event rate in GroupCoordinatorRuntimeMetrics
> 
>
> Key: KAFKA-16317
> URL: https://issues.apache.org/jira/browse/KAFKA-16317
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
>
> We want a sensor to record every time we process a new event in the 
> coordinator runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16674) Adjust classicGroupJoinToConsumerGroup to add subscription model

2024-06-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16674.
-
Resolution: Fixed

This was done in https://github.com/apache/kafka/pull/15785.

> Adjust classicGroupJoinToConsumerGroup to add subscription model
> 
>
> Key: KAFKA-16674
> URL: https://issues.apache.org/jira/browse/KAFKA-16674
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
>
> [https://github.com/apache/kafka/pull/15785] adds subscription model to the 
> group state that affects `classicGroupJoinToConsumerGroup`. We'll need to 
> make adjustment to comply with the change once #15785 is merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-16673) Optimize toTopicPartitions with ConsumerProtocolSubscription

2024-06-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot reopened KAFKA-16673:
-

> Optimize toTopicPartitions with ConsumerProtocolSubscription
> 
>
> Key: KAFKA-16673
> URL: https://issues.apache.org/jira/browse/KAFKA-16673
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>
> https://github.com/apache/kafka/pull/15798#discussion_r1582981154



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16673) Optimize toTopicPartitions with ConsumerProtocolSubscription

2024-06-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16673.
-
Fix Version/s: 3.8.0
   Resolution: Duplicate

This was done as part of https://github.com/apache/kafka/pull/15785.

> Optimize toTopicPartitions with ConsumerProtocolSubscription
> 
>
> Key: KAFKA-16673
> URL: https://issues.apache.org/jira/browse/KAFKA-16673
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>
> https://github.com/apache/kafka/pull/15798#discussion_r1582981154



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16770) Coalesce records into bigger batches

2024-06-11 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16770.
-
Resolution: Fixed

> Coalesce records into bigger batches
> 
>
> Key: KAFKA-16770
> URL: https://issues.apache.org/jira/browse/KAFKA-16770
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16930) UniformHeterogeneousAssignmentBuilder throws NPE when member has not subscriptions

2024-06-11 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16930.
-
Resolution: Fixed

> UniformHeterogeneousAssignmentBuilder throws NPE when member has not 
> subscriptions
> --
>
> Key: KAFKA-16930
> URL: https://issues.apache.org/jira/browse/KAFKA-16930
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.7.0
>    Reporter: David Jacot
>Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>
> {code:java}
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.kafka.coordinator.group.assignor.MemberAssignment.targetPartitions()"
>  because the return value of "java.util.Map.get(Object)" is null
>   at 
> org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.canMemberParticipateInReassignment(GeneralUniformAssignmentBuilder.java:248)
>   at 
> org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.balance(GeneralUniformAssignmentBuilder.java:336)
>   at 
> org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.buildAssignment(GeneralUniformAssignmentBuilder.java:157)
>   at 
> org.apache.kafka.coordinator.group.assignor.UniformAssignor.assign(UniformAssignor.java:84)
>   at 
> org.apache.kafka.coordinator.group.consumer.TargetAssignmentBuilder.build(TargetAssignmentBuilder.java:302)
>   at 
> org.apache.kafka.coordinator.group.GroupMetadataManager.updateTargetAssignment(GroupMetadataManager.java:1913)
>   at 
> org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:1518)
>   at 
> org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:2254)
>   at 
> org.apache.kafka.coordinator.group.GroupCoordinatorShard.consumerGroupHeartbeat(GroupCoordinatorShard.java:308)
>   at 
> org.apache.kafka.coordinator.group.GroupCoordinatorService.lambda$consumerGroupHeartbeat$0(GroupCoordinatorService.java:298)
>   at 
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.lambda$run$0(CoordinatorRuntime.java:769)
>   at 
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.withActiveContextOrThrow(CoordinatorRuntime.java:1582)
>   at 
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.access$1400(CoordinatorRuntime.java:96)
>   at 
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.run(CoordinatorRuntime.java:767)
>   at 
> org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.handleEvents(MultiThreadedEventProcessor.java:144)
>   at 
> org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.run(MultiThreadedEventProcessor.java:176)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16930) UniformHeterogeneousAssignmentBuilder throws NPE when member has not subscriptions

2024-06-11 Thread David Jacot (Jira)
David Jacot created KAFKA-16930:
---

 Summary: UniformHeterogeneousAssignmentBuilder throws NPE when 
member has not subscriptions
 Key: KAFKA-16930
 URL: https://issues.apache.org/jira/browse/KAFKA-16930
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot


{code:java}
java.lang.NullPointerException: Cannot invoke 
"org.apache.kafka.coordinator.group.assignor.MemberAssignment.targetPartitions()"
 because the return value of "java.util.Map.get(Object)" is null
at 
org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.canMemberParticipateInReassignment(GeneralUniformAssignmentBuilder.java:248)
at 
org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.balance(GeneralUniformAssignmentBuilder.java:336)
at 
org.apache.kafka.coordinator.group.assignor.GeneralUniformAssignmentBuilder.buildAssignment(GeneralUniformAssignmentBuilder.java:157)
at 
org.apache.kafka.coordinator.group.assignor.UniformAssignor.assign(UniformAssignor.java:84)
at 
org.apache.kafka.coordinator.group.consumer.TargetAssignmentBuilder.build(TargetAssignmentBuilder.java:302)
at 
org.apache.kafka.coordinator.group.GroupMetadataManager.updateTargetAssignment(GroupMetadataManager.java:1913)
at 
org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:1518)
at 
org.apache.kafka.coordinator.group.GroupMetadataManager.consumerGroupHeartbeat(GroupMetadataManager.java:2254)
at 
org.apache.kafka.coordinator.group.GroupCoordinatorShard.consumerGroupHeartbeat(GroupCoordinatorShard.java:308)
at 
org.apache.kafka.coordinator.group.GroupCoordinatorService.lambda$consumerGroupHeartbeat$0(GroupCoordinatorService.java:298)
at 
org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.lambda$run$0(CoordinatorRuntime.java:769)
at 
org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.withActiveContextOrThrow(CoordinatorRuntime.java:1582)
at 
org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime.access$1400(CoordinatorRuntime.java:96)
at 
org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.run(CoordinatorRuntime.java:767)
at 
org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.handleEvents(MultiThreadedEventProcessor.java:144)
at 
org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.run(MultiThreadedEventProcessor.java:176)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16821) Create a new interface to store member metadata

2024-06-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16821.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Create a new interface to store member metadata
> ---
>
> Key: KAFKA-16821
> URL: https://issues.apache.org/jira/browse/KAFKA-16821
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
> Attachments: Screenshot 2024-05-14 at 11.03.10 AM.png
>
>
> !Screenshot 2024-05-14 at 11.03.10 AM.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14509) Add ConsumerGroupDescribe API

2024-06-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14509.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add ConsumerGroupDescribe API
> -
>
> Key: KAFKA-14509
> URL: https://issues.apache.org/jira/browse/KAFKA-14509
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Max Riedel
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.8.0
>
>
> The goal of this task is to implement the ConsumerGroupDescribe API as 
> described 
> [here|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-ConsumerGroupDescribeAPI];
>  and to implement the related changes in the admin client as described 
> [here|https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-Admin#describeConsumerGroups].
> On the server side, this mainly requires the following steps:
>  # The request/response schemas must be defined (see 
> ListGroupsRequest/Response.json for an example);
>  # Request/response classes must be defined (see 
> ListGroupsRequest/Response.java for an example);
>  # The API must be defined in KafkaApis (see 
> KafkaApis#handleDescribeGroupsRequest for an example);
>  # The GroupCoordinator interface (java file) must be extended for the new 
> operations.
>  # The new operation must be implemented in GroupCoordinatorService (new 
> coordinator in Java) whereas the GroupCoordinatorAdapter (old coordinator in 
> Scala) should just reject the request.
> We could probably do 1) and 2) in one pull request and the remaining ones in 
> another.
> On the admin client side, this mainly requires the followings steps:
>  * Define all the new java classes as defined in the KIP.
>  * Add the new API to KafkaAdminClient class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14701) More broker side partition assignor to common

2024-06-06 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14701.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> More broker side partition assignor to common
> -
>
> Key: KAFKA-14701
> URL: https://issues.apache.org/jira/browse/KAFKA-14701
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.8.0
>    Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>
> Before releasing KIP-848, we need to move the server side partition assignor 
> to its final location in common.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16664) Re-add EventAccumulator.take(timeout)

2024-06-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16664.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Re-add EventAccumulator.take(timeout)
> -
>
> Key: KAFKA-16664
> URL: https://issues.apache.org/jira/browse/KAFKA-16664
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> [https://github.com/apache/kafka/pull/15835] should be used with a timeout in 
> EventAccumulator#take. We added a commit to remove the timeout, we should 
> revert it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16861) Don't convert to group to classic if the size is larger than group max size

2024-06-03 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16861.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Don't convert to group to classic if the size is larger than group max size
> ---
>
> Key: KAFKA-16861
> URL: https://issues.apache.org/jira/browse/KAFKA-16861
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: TengYao Chi
>Priority: Major
> Fix For: 3.8.0
>
>
> It should be one-line fix [0]
> [0] 
> https://github.com/apache/kafka/blob/2d9994e0de915037525f041ff9a9b9325f838938/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupMetadataManager.java#L810



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16864) Copy on write in the Optimized Uniform Assignor

2024-05-31 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16864.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Copy on write in the Optimized Uniform Assignor
> ---
>
> Key: KAFKA-16864
> URL: https://issues.apache.org/jira/browse/KAFKA-16864
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>
> An optimization for the uniform (homogenous) assignor by avoiding creating a 
> copy of all the assignments. Instead, the assignor creates a copy only if the 
> assignment is updated. It is a sort of copy-on-write. This change reduces the 
> overhead of the TargetAssignmentBuilder when ran with the uniform 
> (homogenous) assignor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16860) Introduce `group.version` feature flag

2024-05-31 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16860.
-
Resolution: Fixed

> Introduce `group.version` feature flag
> --
>
> Key: KAFKA-16860
> URL: https://issues.apache.org/jira/browse/KAFKA-16860
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-950: Tiered Storage Disablement

2024-05-30 Thread David Jacot
Hi all,

Thanks for the KIP. This is definitely a worthwhile feature. However, I am
a bit sceptical on the ZK part of the story. The 3.8 release is supposed to
be the last one supporting ZK so I don't really see how we could bring it
to ZK, knowing that we don't plan to do a 3.9 release (current plan). I
strongly suggest clarifying this before implementing the ZK part in order
to avoid having new code [1] being deleted right after 3.8 is released
:). Personally, I agree with Chia-Ping and Mickael. We could drop the ZK
part.

[1] https://github.com/apache/kafka/pull/16131

Best,
David

On Tue, May 28, 2024 at 1:31 PM Mickael Maison 
wrote:

> Hi,
>
> I agree with Chia-Ping, I think we could drop the ZK variant
> altogether, especially if this is not going to make it in 3.8.0.
> Even if we end up needing a 3.9.0 release, I wouldn't write a bunch of
> new ZooKeeper-related code in that release to delete it all right
> after in 4.0.
>
> Thanks,
> Mickael
>
> On Fri, May 24, 2024 at 5:03 PM Christo Lolov 
> wrote:
> >
> > Hello!
> >
> > I am closing this vote as ACCEPTED with 3 binding +1 (Luke, Chia-Ping and
> > Satish) and 1 non-binding +1 (Kamal) - thank you for the reviews!
> >
> > Realistically, I don't think I have the bandwidth to get this in 3.8.0.
> > Due to this, I will mark tentatively the Zookeeper part for 3.9 if the
> > community decides that they do in fact want one more 3.x release.
> > I will mark the KRaft part as ready to be started and aiming for either
> 4.0
> > or 3.9.
> >
> > Best,
> > Christo
>


[jira] [Resolved] (KAFKA-16722) Add ConsumerGroupPartitionAssignor interface

2024-05-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16722.
-
  Reviewer: David Jacot
Resolution: Fixed

> Add ConsumerGroupPartitionAssignor interface
> 
>
> Key: KAFKA-16722
> URL: https://issues.apache.org/jira/browse/KAFKA-16722
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Major
> Fix For: 3.8.0
>
>
> Adds the interface 
> `org.apache.kafka.coordinator.group.assignor.ConsumerGroupPartitionAssignor` 
> as described in KIP-932.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16569) Target Assignment Format Change

2024-05-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16569.
-
Resolution: Won't Do

> Target Assignment Format Change
> ---
>
> Key: KAFKA-16569
> URL: https://issues.apache.org/jira/browse/KAFKA-16569
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
>
> Currently the assignment is stored as Map>, we 
> want to change it to a list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16832) LeaveGroup API for upgrading ConsumerGroup

2024-05-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16832.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> LeaveGroup API for upgrading ConsumerGroup
> --
>
> Key: KAFKA-16832
> URL: https://issues.apache.org/jira/browse/KAFKA-16832
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16860) Introduce `group.version` feature flag

2024-05-28 Thread David Jacot (Jira)
David Jacot created KAFKA-16860:
---

 Summary: Introduce `group.version` feature flag
 Key: KAFKA-16860
 URL: https://issues.apache.org/jira/browse/KAFKA-16860
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16371) Unstable committed offsets after triggering commits where metadata for some partitions are over the limit

2024-05-27 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16371.
-
Fix Version/s: 3.8.0
   3.7.1
 Assignee: David Jacot
   Resolution: Fixed

> Unstable committed offsets after triggering commits where metadata for some 
> partitions are over the limit
> -
>
> Key: KAFKA-16371
> URL: https://issues.apache.org/jira/browse/KAFKA-16371
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 3.7.0
>Reporter: mlowicki
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
>
> Issue is reproducible with simple CLI tool - 
> [https://gist.github.com/mlowicki/c3b942f5545faced93dc414e01a2da70]
> {code:java}
> #!/usr/bin/env bash
> for i in {1..100}
> do
> kafka-committer --bootstrap "ADDR:9092" --topic "TOPIC" --group foo 
> --metadata-min 6000 --metadata-max 1 --partitions 72 --fetch
> done{code}
> What it does it that initially it fetches committed offsets and then tries to 
> commit for multiple partitions. If some of commits have metadata over the 
> allowed limit then:
> 1. I see errors about too large commits (expected)
> 2. Another run the tool fails at the stage of fetching commits with (this is 
> the problem):
> {code:java}
> config: ClientConfig { conf_map: { "group.id": "bar", "bootstrap.servers": 
> "ADDR:9092", }, log_level: Error, }
> fetching committed offsets..
> Error: Meta data fetch error: OperationTimedOut (Local: Timed out) Caused by: 
> OperationTimedOut (Local: Timed out){code}
> On the Kafka side I see _unstable_offset_commits_ errors reported by out 
> internal metric which is derived from:
> {noformat}
>  
> kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=X,error=Y{noformat}
> Increasing the timeout doesn't help and the only solution I've found is to 
> trigger commits for all partitions with metadata below the limit or to use: 
> {code:java}
> isolation.level=read_uncommitted{code}
>  
> I don't know that code very well but 
> [https://github.com/apache/kafka/blob/3.7/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L492-L496]
>  seems fishy:
> {code:java}
>     if (isTxnOffsetCommit) {
>       addProducerGroup(producerId, group.groupId)
>       group.prepareTxnOffsetCommit(producerId, offsetMetadata)
>     } else {
>       group.prepareOffsetCommit(offsetMetadata)
>     }{code}
> as it's using _offsetMetadata_ and not _filteredOffsetMetadata_ and I see 
> that while removing those pending commits we use filtered offset metadata 
> around 
> [https://github.com/apache/kafka/blob/3.7/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L397-L422]
>  
> {code:java}
>       val responseError = group.inLock {
>         if (status.error == Errors.NONE) {
>           if (!group.is(Dead)) {
>             filteredOffsetMetadata.forKeyValue { (topicIdPartition, 
> offsetAndMetadata) =>
>               if (isTxnOffsetCommit)
>                 group.onTxnOffsetCommitAppend(producerId, topicIdPartition, 
> CommitRecordMetadataAndOffset(Some(status.baseOffset), offsetAndMetadata))
>               else
>                 group.onOffsetCommitAppend(topicIdPartition, 
> CommitRecordMetadataAndOffset(Some(status.baseOffset), offsetAndMetadata))
>             }
>           }
>           // Record the number of offsets committed to the log
>           offsetCommitsSensor.record(records.size)
>           Errors.NONE
>         } else {
>           if (!group.is(Dead)) {
>             if (!group.hasPendingOffsetCommitsFromProducer(producerId))
>               removeProducerGroup(producerId, group.groupId)
>             filteredOffsetMetadata.forKeyValue { (topicIdPartition, 
> offsetAndMetadata) =>
>               if (isTxnOffsetCommit)
>                 group.failPendingTxnOffsetCommit(producerId, topicIdPartition)
>               else
>                 group.failPendingOffsetWrite(topicIdPartition, 
> offsetAndMetadata)
>             }
>           }
> {code}
> so the problem might be related to not cleaning up the data structure with 
> pending commits properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16846) Should TxnOffsetCommit API fail all the offsets if any fails the validation?

2024-05-27 Thread David Jacot (Jira)
David Jacot created KAFKA-16846:
---

 Summary: Should TxnOffsetCommit API fail all the offsets if any 
fails the validation?
 Key: KAFKA-16846
 URL: https://issues.apache.org/jira/browse/KAFKA-16846
 Project: Kafka
  Issue Type: Improvement
Reporter: David Jacot


While working on KAFKA-16371, we realized that the handling of 
INVALID_COMMIT_OFFSET_SIZE errors while committer transaction offsets, is a bit 
inconsistent between the server and the client. On the server, the offsets are 
validated independently from each others. Hence if two offsets A and B are 
committed and A fails the validation, B is still written to the log as part of 
the transaction. On the client, when INVALID_COMMIT_OFFSET_SIZE is received, 
the transaction transitions to the fatal state. Hence the transaction will be 
eventually aborted.

The client side API is quite limiting here because it does not return an error 
per committed offsets. It is all or nothing. From this point of view, the 
current behaviour is correct. It seems that we could either change the API and 
let the user decide what to do; or we could strengthen the validation on the 
server to fail all the offsets if any of them fails (all or nothing). We could 
also leave it as it is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16625) Reverse Lookup Partition to Member in Assignors

2024-05-25 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16625.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Reverse Lookup Partition to Member in Assignors
> ---
>
> Key: KAFKA-16625
> URL: https://issues.apache.org/jira/browse/KAFKA-16625
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>
> Calculating unassigned partitions within the Uniform assignor is a costly 
> process, this can be improved by using a reverse lookup map between 
> topicIdPartition and the member



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16831) CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size write limit

2024-05-24 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16831.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> CoordinatorRuntime should initialize MemoryRecordsBuilder with max batch size 
> write limit
> -
>
> Key: KAFKA-16831
> URL: https://issues.apache.org/jira/browse/KAFKA-16831
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> Otherwise, we default to the min buffer size of 16384 for the write limit. 
> This causes the coordinator to threw RecordTooLargeException even when it's 
> under the 1MB max batch size limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16815) Handle FencedInstanceId on heartbeat for new consumer

2024-05-24 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16815.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Handle FencedInstanceId on heartbeat for new consumer
> -
>
> Key: KAFKA-16815
> URL: https://issues.apache.org/jira/browse/KAFKA-16815
> Project: Kafka
>  Issue Type: Task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> With the new consumer group protocol, a member could receive a 
> FencedInstanceIdError in the heartbeat response. This could be the case when 
> an active member using a group instance id is removed from the group by an 
> admin client. If a second member joins with the same instance id, the first 
> member will receive a FencedInstanceId on the next heartbeat response. This 
> should be treated as a fatal error (consumer should not attempt to rejoin). 
> Currently, the FencedInstanceId is not explicitly handled by the client in 
> the HeartbeatRequestManager. It ends up being treated as a fatal error, see 
> [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java#L417]
>  (just because it lands on the "unexpected" error category). We should handle 
> it explicitly, just to make sure that we express that it's is an expected 
> error: log a proper message for it and fail (handleFatalFailure). We should 
> also that the error is included in the tests that cover the HB request error 
> handling 
> ([here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/clients/src/test/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManagerTest.java#L798])
>     



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16626) Uuid to String for subscribed topic names in assignment spec

2024-05-24 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16626.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Uuid to String for subscribed topic names in assignment spec
> 
>
> Key: KAFKA-16626
> URL: https://issues.apache.org/jira/browse/KAFKA-16626
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> In creating the assignment spec from the existing consumer subscription 
> metadata, quite some time is spent in converting the String to a Uuid
> Change from Uuid to String for the subscribed topics in assignment spec and 
> convert on the fly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16793) Heartbeat API for upgrading ConsumerGroup

2024-05-22 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16793.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Heartbeat API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16793
> URL: https://issues.apache.org/jira/browse/KAFKA-16793
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16762) SyncGroup API for upgrading ConsumerGroup

2024-05-17 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16762.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> SyncGroup API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16762
> URL: https://issues.apache.org/jira/browse/KAFKA-16762
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-932: Queues for Kafka

2024-05-16 Thread David Jacot
Hi Andrew,

Thanks for the KIP! This is really exciting! +1 (binding) from me.

One note regarding the partition assignor interface changes that you
proposed, it would be great to get the changes in 3.8 in order to not break
the API of KIP-848 after the preview.

Best,
David

On Wed, May 15, 2024 at 10:37 PM Jun Rao  wrote:

> Hi, Andrew,
>
> Thanks for the update. Should we mark whether those metrics are
> standard/required for KIP-714?
>
> Jun
>
> On Tue, May 14, 2024 at 7:31 AM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
> > Hi,
> > I have made a small update to the KIP as a result of testing the new
> > share consumer with client telemetry (KIP-714).
> >
> > I’ve added telemetry metric names to the table of client metrics and
> > also updated the metric group names so that the resulting client metrics
> > sent to the broker have consistent names.
> >
> > Thanks,
> > Andrew
> >
> > > On 8 May 2024, at 12:51, Manikumar  wrote:
> > >
> > > Hi Andrew,
> > >
> > > Thanks for the KIP.  Great write-up!
> > >
> > > +1 (binding)
> > >
> > > Thanks,
> > >
> > > On Wed, May 8, 2024 at 12:17 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> > >>
> > >> Hi Andrew,
> > >> Thanks for the nice KIP, it will allow other messaging use cases to be
> > >> onboarded to Kafka.
> > >>
> > >> +1 from me.
> > >>
> > >> Satish.
> > >>
> > >> On Tue, 7 May 2024 at 03:41, Jun Rao 
> wrote:
> > >>>
> > >>> Hi, Andrew,
> > >>>
> > >>> Thanks for the KIP. +1
> > >>>
> > >>> Jun
> > >>>
> > >>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar <
> edoardli...@gmail.com>
> > >>> wrote:
> > >>>
> >  Thanks Andrew,
> > 
> >  +1 (binding)
> > 
> >  Edo
> > 
> >  On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
> >   wrote:
> > >
> > > Hi Andrew
> > >
> > > + 1 (Non-Binding)
> > >
> > > This will be great addition to Kafka
> > >
> > > On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal <
> > apoorvmitta...@gmail.com>
> > > wrote:
> > >
> > >> Hi Andrew,
> > >> Thanks for writing the KIP. This is indeed going to be a valuable
> >  addition
> > >> to the Kafka, excited to see the KIP.
> > >>
> > >> + 1 (Non-Binding)
> > >>
> > >> Regards,
> > >> Apoorv Mittal
> > >> +44 7721681581
> > >>
> > >>
> > >> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
> > >> andrew_schofield_j...@outlook.com> wrote:
> > >>
> > >>> Hi,
> > >>> I’ve been working to complete KIP-932 over the past few months
> and
> > >>> discussions have quietened down.
> > >>>
> > >>> I’d like to open the voting for KIP-932:
> > >>>
> > >>>
> > >>>
> > >>
> > 
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
> > >>>
> > >>> Thanks,
> > >>> Andrew
> > >>
> > 
> >
> >
>


[jira] [Created] (KAFKA-16770) Coalesce records into bigger batches

2024-05-15 Thread David Jacot (Jira)
David Jacot created KAFKA-16770:
---

 Summary: Coalesce records into bigger batches
 Key: KAFKA-16770
 URL: https://issues.apache.org/jira/browse/KAFKA-16770
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16694) Remove rack aware code in assignors temporarily due to performance

2024-05-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16694.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Remove rack aware code in assignors temporarily due to performance
> --
>
> Key: KAFKA-16694
> URL: https://issues.apache.org/jira/browse/KAFKA-16694
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Minor
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15578) Run System Tests for Old protocol in the New Coordinator

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15578.
-
Resolution: Fixed

> Run System Tests for Old protocol in the New Coordinator
> 
>
> Key: KAFKA-15578
> URL: https://issues.apache.org/jira/browse/KAFKA-15578
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
>  Labels: kip-848-preview
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Change existing system tests related to the consumer group protocol and group 
> coordinator to test the old protocol running with the new coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16117) Add Integration test for checking if the correct assignor is chosen

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16117.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add Integration test for checking if the correct assignor is chosen
> ---
>
> Key: KAFKA-16117
> URL: https://issues.apache.org/jira/browse/KAFKA-16117
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Priority: Minor
> Fix For: 3.8.0
>
>
> h4.  We are trying to test this section of the KIP-848
> h4. Assignor Selection
> The group coordinator has to determine which assignment strategy must be used 
> for the group. The group's members may not have exactly the same assignors at 
> any given point in time - e.g. they may migrate from an assignor to another 
> one for instance. The group coordinator will chose the assignor as follow:
>  * A client side assignor is used if possible. This means that a client side 
> assignor must be supported by all the members. If multiple are, it will 
> respect the precedence defined by the members when they advertise their 
> supported client side assignors.
>  * A server side assignor is used otherwise. If multiple server side 
> assignors are specified in the group, the group coordinator uses the most 
> common one. If a member does not provide an assignor, the group coordinator 
> will default to the first one in {{{}group.consumer.assignors{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16735) Deprecate offsets.commit.required.acks in 3.8

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16735.
-
Resolution: Fixed

> Deprecate offsets.commit.required.acks in 3.8
> -
>
> Key: KAFKA-16735
> URL: https://issues.apache.org/jira/browse/KAFKA-16735
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16736) Remove offsets.commit.required.acks in 4.0

2024-05-13 Thread David Jacot (Jira)
David Jacot created KAFKA-16736:
---

 Summary: Remove offsets.commit.required.acks in 4.0
 Key: KAFKA-16736
 URL: https://issues.apache.org/jira/browse/KAFKA-16736
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16735) Deprecate offsets.commit.required.acks in 3.8

2024-05-13 Thread David Jacot (Jira)
David Jacot created KAFKA-16735:
---

 Summary: Deprecate offsets.commit.required.acks in 3.8
 Key: KAFKA-16735
 URL: https://issues.apache.org/jira/browse/KAFKA-16735
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-13 Thread David Jacot
+1 (binding) from me too.

The KIP passes with binding votes from Justine, Manikumar and me; and
non-binding votes from Andrew and Federico.

Thanks,
David

On Mon, May 13, 2024 at 1:52 PM Manikumar  wrote:

> +1 (binding).
>
> Thanks for the KIP.
>
> Manikumar
>
> On Wed, May 8, 2024 at 9:55 PM Justine Olshan
>  wrote:
> >
> > +1 (binding)
> >
> > Thanks,
> > Justine
> >
> > On Wed, May 8, 2024 at 8:36 AM Federico Valeri 
> wrote:
> >
> > > +1 non binding
> > >
> > > Thanks
> > >
> > > On Wed, May 8, 2024 at 5:27 PM Andrew Schofield
> > >  wrote:
> > > >
> > > > Hi,
> > > > Thanks for the KIP.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > > On 8 May 2024, at 15:48, David Jacot 
> > > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I'd like to start a voting thread for KIP-1041: Drop
> > > > > `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8).
> > > > >
> > > > > KIP: https://cwiki.apache.org/confluence/x/9YobEg
> > > > >
> > > > > Best,
> > > > > David
> > > >
> > >
>


[jira] [Resolved] (KAFKA-16587) Store subscription model for consumer group in group state

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16587.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Store subscription model for consumer group in group state
> --
>
> Key: KAFKA-16587
> URL: https://issues.apache.org/jira/browse/KAFKA-16587
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>
> Currently we iterate through all the subscribed topics for each member in the 
> consumer group to determine whether all the members are subscribed to the 
> same set of topics aka it has a homogeneous subscription model.
> Instead of iterating and comparing the topicIds on every rebalance, we want 
> to maintain this information in the group state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16663) CoordinatorRuntime write timer tasks should be cancelled once HWM advances

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16663.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> CoordinatorRuntime write timer tasks should be cancelled once HWM advances
> --
>
> Key: KAFKA-16663
> URL: https://issues.apache.org/jira/browse/KAFKA-16663
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> Otherwise, we pile up the number of timer tasks which are no-ops if 
> replication was successful. They stay in memory for 15 seconds and as the 
> rate of write increases, this may heavily impact memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-08 Thread David Jacot
Hi folks,

I'd like to start a voting thread for KIP-1041: Drop
`offsets.commit.required.acks` config in 4.0 (deprecate in 3.8).

KIP: https://cwiki.apache.org/confluence/x/9YobEg

Best,
David


[jira] [Resolved] (KAFKA-16307) fix EventAccumulator thread idle ratio metric

2024-05-07 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16307.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> fix EventAccumulator thread idle ratio metric
> -
>
> Key: KAFKA-16307
> URL: https://issues.apache.org/jira/browse/KAFKA-16307
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> The metric does not seem to be accurate, nor reporting metrics at every 
> interval. Requires investigation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16615) JoinGroup API for upgrading ConsumerGroup

2024-05-07 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16615.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
 Assignee: Dongnuo Lyu
   Resolution: Fixed

> JoinGroup API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16615
> URL: https://issues.apache.org/jira/browse/KAFKA-16615
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-02 Thread David Jacot
Hi folks,

I have put together a very small KIP to
deprecate offsets.commit.required.acks in 3.8 and remove it in 4.0. See the
motivation for the reason.

KIP: https://cwiki.apache.org/confluence/x/9YobEg

Please let me know what you think.

Best,
David


[jira] [Created] (KAFKA-16658) Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-02 Thread David Jacot (Jira)
David Jacot created KAFKA-16658:
---

 Summary: Drop `offsets.commit.required.acks` config in 4.0 
(deprecate in 3.8)
 Key: KAFKA-16658
 URL: https://issues.apache.org/jira/browse/KAFKA-16658
 Project: Kafka
  Issue Type: New Feature
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16568) Add JMH Benchmarks for assignor performance testing

2024-04-25 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16568.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add JMH Benchmarks for assignor performance testing 
> 
>
> Key: KAFKA-16568
> URL: https://issues.apache.org/jira/browse/KAFKA-16568
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>
> The 3 benchmarks that are being used to test the performance and efficiency 
> of the consumer group rebalance process.
>  * Client Assignors (assign method)
>  * Server Assignors (assign method)
>  * Target Assignment Builder (build method)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-25 Thread David Jacot
nks for the reply.
> >>>>
> >>>> 123. Every time the GC fails over, it needs to recompute the
> assignment
> >>>> for every member. However, the impact of re-assignment is not that
> >> onerous.
> >>>> If the recomputed assignments are the same, which they may well be,
> >> there
> >>>> is no impact on the members at all.
> >>>>
> >>>> On receiving the new assignment, the member adjusts the
> topic-partitions
> >>>> in its share sessions, removing those which were revoked and adding
> >> those
> >>>> which were assigned. It is able to acknowledge the records it fetched
> >> from
> >>>> the partitions which have just been revoked, and it doesn’t need to
> >> confirm
> >>>> the assignment back to the GC.
> >>>>
> >>>> 125. I don’t think the GC needs to write ShareGroupPartitionMetadata
> >>>> when processing AlterShareGroupOffsets. This is because the operation
> >>>> happens as a result of an explicit administrative action and it is
> >> possible
> >>>> to return a specific error code for each topic-partition. The cases
> >> where
> >>>> ShareGroupPartitionMetadata is used are when a topic is added or
> removed
> >>>> from the subscribed topics, or the number of partitions changes.
> >>>>
> >>>> 130. I suppose that limits the minimum lock timeout for a cluster to
> >>>> prevent
> >>>> a group from having an excessively low value. Config added.
> >>>>
> >>>> 131. I have changed it to group.share.partition.max.record.locks.
> >>>>
> >>>> 136.  When GC failover occurs, the GC gaining ownership of a partition
> >> of
> >>>> the __consumer_offsets topic replays the records to build its state.
> >>>> In the case of a share group, it learns:
> >>>>
> >>>> * The share group and its group epoch (ShareGroupMetadata)
> >>>> * The list of members (ShareGroupMemberMetadata)
> >>>> * The list of share-partitions (ShareGroupPartitionMetadata)
> >>>>
> >>>> It will recompute the assignments in order to respond to
> >>>> ShareGroupHeartbeat requests. As a result, it bumps the group epoch.
> >>>>
> >>>> I will update the KIP accordingly to confirm the behaviour.
> >>>>
> >>>> 137.1: The GC and the SPL report the metrics in the
> >>>> group-coordinator-metrics
> >>>> group. Unlike consumer groups in which the GC performs offset commit,
> >>>> the share group equivalent is performed by the SPL. So, I’ve grouped
> the
> >>>> concepts which relate to the group in group-coordinator-metrics.
> >>>>
> >>>> The SC reports the metrics in the share-coordinator-metrics group.
> >>>>
> >>>> 137.2: There is one metric in both groups - partition-load-time. In
> the
> >> SC
> >>>> group,
> >>>> it refers to the time loading data from the share-group state topic so
> >> that
> >>>> a ReadShareGroupState request can be answered. In the GC group,
> >>>> it refers to the time to read the state from the persister. Apart from
> >> the
> >>>> interbroker RPC latency of the read, they’re likely to be very close.
> >>>>
> >>>> Later, for a cluster which is using a custom persister, the
> >>>> share-coordinator
> >>>> metrics would likely not be reported, and the persister would have its
> >> own
> >>>> metrics.
> >>>>
> >>>> 137.3: Correct. Fixed.
> >>>>
> >>>> 137.4: Yes, it does include the time to write to the internal topic.
> >>>> I’ve tweaked the description.
> >>>>
> >>>> Thanks,
> >>>> Andrew
> >>>>
> >>>>> On 22 Apr 2024, at 20:04, Jun Rao  wrote:
> >>>>>
> >>>>> Hi, Andrew,
> >>>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> 123. "The share group does not persist the target assignment."
> >>>>> What's the impact of this? Everytime that GC fails over, it needs to
> >>>>> recompute the assignment for every member. Do we expect the member
> >>>>> assignment to c

  1   2   3   4   5   6   7   8   9   10   >