[jira] [Resolved] (KAFKA-16793) Heartbeat API for upgrading ConsumerGroup

2024-05-23 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16793.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Heartbeat API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16793
> URL: https://issues.apache.org/jira/browse/KAFKA-16793
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16762) SyncGroup API for upgrading ConsumerGroup

2024-05-17 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16762.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> SyncGroup API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16762
> URL: https://issues.apache.org/jira/browse/KAFKA-16762
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-932: Queues for Kafka

2024-05-16 Thread David Jacot
Hi Andrew,

Thanks for the KIP! This is really exciting! +1 (binding) from me.

One note regarding the partition assignor interface changes that you
proposed, it would be great to get the changes in 3.8 in order to not break
the API of KIP-848 after the preview.

Best,
David

On Wed, May 15, 2024 at 10:37 PM Jun Rao  wrote:

> Hi, Andrew,
>
> Thanks for the update. Should we mark whether those metrics are
> standard/required for KIP-714?
>
> Jun
>
> On Tue, May 14, 2024 at 7:31 AM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
> > Hi,
> > I have made a small update to the KIP as a result of testing the new
> > share consumer with client telemetry (KIP-714).
> >
> > I’ve added telemetry metric names to the table of client metrics and
> > also updated the metric group names so that the resulting client metrics
> > sent to the broker have consistent names.
> >
> > Thanks,
> > Andrew
> >
> > > On 8 May 2024, at 12:51, Manikumar  wrote:
> > >
> > > Hi Andrew,
> > >
> > > Thanks for the KIP.  Great write-up!
> > >
> > > +1 (binding)
> > >
> > > Thanks,
> > >
> > > On Wed, May 8, 2024 at 12:17 PM Satish Duggana <
> satish.dugg...@gmail.com>
> > wrote:
> > >>
> > >> Hi Andrew,
> > >> Thanks for the nice KIP, it will allow other messaging use cases to be
> > >> onboarded to Kafka.
> > >>
> > >> +1 from me.
> > >>
> > >> Satish.
> > >>
> > >> On Tue, 7 May 2024 at 03:41, Jun Rao 
> wrote:
> > >>>
> > >>> Hi, Andrew,
> > >>>
> > >>> Thanks for the KIP. +1
> > >>>
> > >>> Jun
> > >>>
> > >>> On Mon, Mar 18, 2024 at 11:00 AM Edoardo Comar <
> edoardli...@gmail.com>
> > >>> wrote:
> > >>>
> >  Thanks Andrew,
> > 
> >  +1 (binding)
> > 
> >  Edo
> > 
> >  On Mon, 18 Mar 2024 at 16:32, Kenneth Eversole
> >   wrote:
> > >
> > > Hi Andrew
> > >
> > > + 1 (Non-Binding)
> > >
> > > This will be great addition to Kafka
> > >
> > > On Mon, Mar 18, 2024 at 8:27 AM Apoorv Mittal <
> > apoorvmitta...@gmail.com>
> > > wrote:
> > >
> > >> Hi Andrew,
> > >> Thanks for writing the KIP. This is indeed going to be a valuable
> >  addition
> > >> to the Kafka, excited to see the KIP.
> > >>
> > >> + 1 (Non-Binding)
> > >>
> > >> Regards,
> > >> Apoorv Mittal
> > >> +44 7721681581
> > >>
> > >>
> > >> On Sun, Mar 17, 2024 at 11:16 PM Andrew Schofield <
> > >> andrew_schofield_j...@outlook.com> wrote:
> > >>
> > >>> Hi,
> > >>> I’ve been working to complete KIP-932 over the past few months
> and
> > >>> discussions have quietened down.
> > >>>
> > >>> I’d like to open the voting for KIP-932:
> > >>>
> > >>>
> > >>>
> > >>
> > 
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka
> > >>>
> > >>> Thanks,
> > >>> Andrew
> > >>
> > 
> >
> >
>


[jira] [Created] (KAFKA-16770) Coalesce records into bigger batches

2024-05-15 Thread David Jacot (Jira)
David Jacot created KAFKA-16770:
---

 Summary: Coalesce records into bigger batches
 Key: KAFKA-16770
 URL: https://issues.apache.org/jira/browse/KAFKA-16770
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.8.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16694) Remove rack aware code in assignors temporarily due to performance

2024-05-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16694.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Remove rack aware code in assignors temporarily due to performance
> --
>
> Key: KAFKA-16694
> URL: https://issues.apache.org/jira/browse/KAFKA-16694
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Minor
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15578) Run System Tests for Old protocol in the New Coordinator

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15578.
-
Resolution: Fixed

> Run System Tests for Old protocol in the New Coordinator
> 
>
> Key: KAFKA-15578
> URL: https://issues.apache.org/jira/browse/KAFKA-15578
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
>  Labels: kip-848-preview
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Change existing system tests related to the consumer group protocol and group 
> coordinator to test the old protocol running with the new coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16117) Add Integration test for checking if the correct assignor is chosen

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16117.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add Integration test for checking if the correct assignor is chosen
> ---
>
> Key: KAFKA-16117
> URL: https://issues.apache.org/jira/browse/KAFKA-16117
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Priority: Minor
> Fix For: 3.8.0
>
>
> h4.  We are trying to test this section of the KIP-848
> h4. Assignor Selection
> The group coordinator has to determine which assignment strategy must be used 
> for the group. The group's members may not have exactly the same assignors at 
> any given point in time - e.g. they may migrate from an assignor to another 
> one for instance. The group coordinator will chose the assignor as follow:
>  * A client side assignor is used if possible. This means that a client side 
> assignor must be supported by all the members. If multiple are, it will 
> respect the precedence defined by the members when they advertise their 
> supported client side assignors.
>  * A server side assignor is used otherwise. If multiple server side 
> assignors are specified in the group, the group coordinator uses the most 
> common one. If a member does not provide an assignor, the group coordinator 
> will default to the first one in {{{}group.consumer.assignors{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16735) Deprecate offsets.commit.required.acks in 3.8

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16735.
-
Resolution: Fixed

> Deprecate offsets.commit.required.acks in 3.8
> -
>
> Key: KAFKA-16735
> URL: https://issues.apache.org/jira/browse/KAFKA-16735
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16736) Remove offsets.commit.required.acks in 4.0

2024-05-13 Thread David Jacot (Jira)
David Jacot created KAFKA-16736:
---

 Summary: Remove offsets.commit.required.acks in 4.0
 Key: KAFKA-16736
 URL: https://issues.apache.org/jira/browse/KAFKA-16736
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16735) Deprecate offsets.commit.required.acks in 3.8

2024-05-13 Thread David Jacot (Jira)
David Jacot created KAFKA-16735:
---

 Summary: Deprecate offsets.commit.required.acks in 3.8
 Key: KAFKA-16735
 URL: https://issues.apache.org/jira/browse/KAFKA-16735
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-13 Thread David Jacot
+1 (binding) from me too.

The KIP passes with binding votes from Justine, Manikumar and me; and
non-binding votes from Andrew and Federico.

Thanks,
David

On Mon, May 13, 2024 at 1:52 PM Manikumar  wrote:

> +1 (binding).
>
> Thanks for the KIP.
>
> Manikumar
>
> On Wed, May 8, 2024 at 9:55 PM Justine Olshan
>  wrote:
> >
> > +1 (binding)
> >
> > Thanks,
> > Justine
> >
> > On Wed, May 8, 2024 at 8:36 AM Federico Valeri 
> wrote:
> >
> > > +1 non binding
> > >
> > > Thanks
> > >
> > > On Wed, May 8, 2024 at 5:27 PM Andrew Schofield
> > >  wrote:
> > > >
> > > > Hi,
> > > > Thanks for the KIP.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > > On 8 May 2024, at 15:48, David Jacot 
> > > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > I'd like to start a voting thread for KIP-1041: Drop
> > > > > `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8).
> > > > >
> > > > > KIP: https://cwiki.apache.org/confluence/x/9YobEg
> > > > >
> > > > > Best,
> > > > > David
> > > >
> > >
>


[jira] [Resolved] (KAFKA-16587) Store subscription model for consumer group in group state

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16587.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> Store subscription model for consumer group in group state
> --
>
> Key: KAFKA-16587
> URL: https://issues.apache.org/jira/browse/KAFKA-16587
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>
> Currently we iterate through all the subscribed topics for each member in the 
> consumer group to determine whether all the members are subscribed to the 
> same set of topics aka it has a homogeneous subscription model.
> Instead of iterating and comparing the topicIds on every rebalance, we want 
> to maintain this information in the group state



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16663) CoordinatorRuntime write timer tasks should be cancelled once HWM advances

2024-05-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16663.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> CoordinatorRuntime write timer tasks should be cancelled once HWM advances
> --
>
> Key: KAFKA-16663
> URL: https://issues.apache.org/jira/browse/KAFKA-16663
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> Otherwise, we pile up the number of timer tasks which are no-ops if 
> replication was successful. They stay in memory for 15 seconds and as the 
> rate of write increases, this may heavily impact memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-08 Thread David Jacot
Hi folks,

I'd like to start a voting thread for KIP-1041: Drop
`offsets.commit.required.acks` config in 4.0 (deprecate in 3.8).

KIP: https://cwiki.apache.org/confluence/x/9YobEg

Best,
David


[jira] [Resolved] (KAFKA-16307) fix EventAccumulator thread idle ratio metric

2024-05-07 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16307.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
   Resolution: Fixed

> fix EventAccumulator thread idle ratio metric
> -
>
> Key: KAFKA-16307
> URL: https://issues.apache.org/jira/browse/KAFKA-16307
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> The metric does not seem to be accurate, nor reporting metrics at every 
> interval. Requires investigation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16615) JoinGroup API for upgrading ConsumerGroup

2024-05-07 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16615.
-
Fix Version/s: 3.8.0
 Reviewer: David Jacot
 Assignee: Dongnuo Lyu
   Resolution: Fixed

> JoinGroup API for upgrading ConsumerGroup
> -
>
> Key: KAFKA-16615
> URL: https://issues.apache.org/jira/browse/KAFKA-16615
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] KIP-1041: Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-02 Thread David Jacot
Hi folks,

I have put together a very small KIP to
deprecate offsets.commit.required.acks in 3.8 and remove it in 4.0. See the
motivation for the reason.

KIP: https://cwiki.apache.org/confluence/x/9YobEg

Please let me know what you think.

Best,
David


[jira] [Created] (KAFKA-16658) Drop `offsets.commit.required.acks` config in 4.0 (deprecate in 3.8)

2024-05-02 Thread David Jacot (Jira)
David Jacot created KAFKA-16658:
---

 Summary: Drop `offsets.commit.required.acks` config in 4.0 
(deprecate in 3.8)
 Key: KAFKA-16658
 URL: https://issues.apache.org/jira/browse/KAFKA-16658
 Project: Kafka
  Issue Type: New Feature
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16568) Add JMH Benchmarks for assignor performance testing

2024-04-25 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16568.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add JMH Benchmarks for assignor performance testing 
> 
>
> Key: KAFKA-16568
> URL: https://issues.apache.org/jira/browse/KAFKA-16568
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>
> The 3 benchmarks that are being used to test the performance and efficiency 
> of the consumer group rebalance process.
>  * Client Assignors (assign method)
>  * Server Assignors (assign method)
>  * Target Assignment Builder (build method)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-25 Thread David Jacot
; >>>> 123. Every time the GC fails over, it needs to recompute the
> assignment
> >>>> for every member. However, the impact of re-assignment is not that
> >> onerous.
> >>>> If the recomputed assignments are the same, which they may well be,
> >> there
> >>>> is no impact on the members at all.
> >>>>
> >>>> On receiving the new assignment, the member adjusts the
> topic-partitions
> >>>> in its share sessions, removing those which were revoked and adding
> >> those
> >>>> which were assigned. It is able to acknowledge the records it fetched
> >> from
> >>>> the partitions which have just been revoked, and it doesn’t need to
> >> confirm
> >>>> the assignment back to the GC.
> >>>>
> >>>> 125. I don’t think the GC needs to write ShareGroupPartitionMetadata
> >>>> when processing AlterShareGroupOffsets. This is because the operation
> >>>> happens as a result of an explicit administrative action and it is
> >> possible
> >>>> to return a specific error code for each topic-partition. The cases
> >> where
> >>>> ShareGroupPartitionMetadata is used are when a topic is added or
> removed
> >>>> from the subscribed topics, or the number of partitions changes.
> >>>>
> >>>> 130. I suppose that limits the minimum lock timeout for a cluster to
> >>>> prevent
> >>>> a group from having an excessively low value. Config added.
> >>>>
> >>>> 131. I have changed it to group.share.partition.max.record.locks.
> >>>>
> >>>> 136.  When GC failover occurs, the GC gaining ownership of a partition
> >> of
> >>>> the __consumer_offsets topic replays the records to build its state.
> >>>> In the case of a share group, it learns:
> >>>>
> >>>> * The share group and its group epoch (ShareGroupMetadata)
> >>>> * The list of members (ShareGroupMemberMetadata)
> >>>> * The list of share-partitions (ShareGroupPartitionMetadata)
> >>>>
> >>>> It will recompute the assignments in order to respond to
> >>>> ShareGroupHeartbeat requests. As a result, it bumps the group epoch.
> >>>>
> >>>> I will update the KIP accordingly to confirm the behaviour.
> >>>>
> >>>> 137.1: The GC and the SPL report the metrics in the
> >>>> group-coordinator-metrics
> >>>> group. Unlike consumer groups in which the GC performs offset commit,
> >>>> the share group equivalent is performed by the SPL. So, I’ve grouped
> the
> >>>> concepts which relate to the group in group-coordinator-metrics.
> >>>>
> >>>> The SC reports the metrics in the share-coordinator-metrics group.
> >>>>
> >>>> 137.2: There is one metric in both groups - partition-load-time. In
> the
> >> SC
> >>>> group,
> >>>> it refers to the time loading data from the share-group state topic so
> >> that
> >>>> a ReadShareGroupState request can be answered. In the GC group,
> >>>> it refers to the time to read the state from the persister. Apart from
> >> the
> >>>> interbroker RPC latency of the read, they’re likely to be very close.
> >>>>
> >>>> Later, for a cluster which is using a custom persister, the
> >>>> share-coordinator
> >>>> metrics would likely not be reported, and the persister would have its
> >> own
> >>>> metrics.
> >>>>
> >>>> 137.3: Correct. Fixed.
> >>>>
> >>>> 137.4: Yes, it does include the time to write to the internal topic.
> >>>> I’ve tweaked the description.
> >>>>
> >>>> Thanks,
> >>>> Andrew
> >>>>
> >>>>> On 22 Apr 2024, at 20:04, Jun Rao  wrote:
> >>>>>
> >>>>> Hi, Andrew,
> >>>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> 123. "The share group does not persist the target assignment."
> >>>>> What's the impact of this? Everytime that GC fails over, it needs to
> >>>>> recompute the assignment for every member. Do we expect the member
> >>>>> assignment to change on every GC failover?
> >>>>>
&

Re: [DISCUSS] KIP-932: Queues for Kafka

2024-04-15 Thread David Jacot
Hi Andrew,

Thanks for the KIP. This work is really exciting.

I finally had a bit of time to go through the KIP. I need to read it a
second time in order to get into the details. I have noted a few
points/questions:

001: The dynamic config to force the group type is really weird. As you
said, groups are created on first use and so they are. If we want something
better, we should rather make the creation of the group explicit.

002: It is weird to write a ConsumerGroupMetadata to reserve the group id.
I think that we should rather have a ShareGroupMetadata for this purpose.
Similarly, I don't think that we should add a type to the
ConsumerGroupMetadataValue record. This record is meant to be used by
"consumer" groups.

003: I don't fully understand the motivation for having the
ShareGroupPartitionMetadata record and the InitializeShareGroupState API
called from the group coordinator. Could you elaborate a bit more? Isn't it
possible to lazily initialize the state in the share coordinator when the
share leader fetches the state for the first time?

004: Could you precise how the group expiration will work? I did not see it
mentioned in the KIP but I may have missed it.

005: I would like to ensure that I understand the proposal for the share
coordinator. It looks like we want it to be an internal service. By this, I
mean that it won't be directly accessed by external users. Is my
understanding correct?

006: group.share.enable: We should rather use
`group.coordinator.rebalance.protocols` with `share`.

007: SimpleShareAssignor, do we have an interface for it?

008: For my understanding, will the SPSO and SPEO bookeeped in the
Partition and in the Log layer?

009: Is there a reason why we still need the ShareAcknowledge API if
acknowledging can also be done with the ShareFetch API?

010: Do we plan to limit the number of share sessions on the share leader?
The KIP mentions a limit calculated based on group.share.max.groups and
group.share.max.size but it is quite vague.

011: Do you have an idea of the size that ShareSnapshot will use in
practice? Could it get larger than the max size of the batch within a
partition (default to 1MB)

012: Regarding the share group coordinator, do you plan to implement it on
top of the CoordinatorRuntime introduced by KIP-848? I hope so in order to
reuse code.

013: Following my previous question, do we need a config similar to
`group.coordinator.threads` for the share coordinator?

014: I am not sure to understand why we need
`group.share.state.topic.min.isr`. Is the topic level configuration enough
for this?

015: ShareGroupHeartbeat API: Do we need RebalanceTimeoutMs? What's its
purpose if there is no revocation in the protocol?

016: ShareGroupPartitionMetadataValue: What are the StartPartitionIndex and
EndPartitionIndex?

017: The metric `num-partitions` with a tag called protocol does not make
sense in the group coordinator. The number of partitions is the number of
__consumer_offsets partitions here.

018: Do we need a tag for `share-acknowledgement` if the name is already
scope to share groups?

019: Should we also scope the name of `record-acknowledgement` to follow
`share-acknowledgement`?

020: I suppose that the SPEO is always bounded by the HWM. It may be good
to call it out. Is it also bounded by the LSO?

021: WriteShareGroupState API: Is there a mechanism to prevent zombie share
leaders from committing wrong state?

Best,
David


On Fri, Apr 12, 2024 at 2:32 PM Andrew Schofield 
wrote:

> Hi,
> 77. I’ve updated the KIP to use log retention rather than log compaction.
> The basic ideas of what to persist are unchanged. It makes a few changes:
>
> * It changes the record names: ShareCheckpoint -> ShareSnapshot and
>   ShareDelta -> ShareUpdate. They’re equivalent, but renaming makes it
>   simple to check I did an atomic change to the new proposal.
> * It uses log retention and explicit pruning of elderly records using
>   ReplicaManager.deleteRecords
> * It gets rid of the nasty DeltaIndex scheme because we don’t need to worry
>   about the log compactor and key uniqueness.
>
> I have also changed the ambiguous “State” to “DeliveryState” in RPCs
> and records.
>
> And I added a clarification about how the “group.type” configuration should
> be used.
>
> Thanks,
> Andrew
>
> > On 10 Apr 2024, at 15:33, Andrew Schofield <
> andrew_schofield_j...@live.com> wrote:
> >
> > Hi Jun,
> > Thanks for your questions.
> >
> > 41.
> > 41.1. The partition leader obtains the state epoch in the response from
> > ReadShareGroupState. When it becomes a share-partition leader,
> > it reads the share-group state and one of the things it learns is the
> > current state epoch. Then it uses the state epoch in all subsequent
> > calls to WriteShareGroupState. The fencing is to prevent writes for
> > a previous state epoch, which are very unlikely but which would mean
> > that a leader was using an out-of-date epoch and was likely no longer
> > the current leader at all, perhaps due 

[jira] [Resolved] (KAFKA-16294) Add group protocol migration enabling config

2024-04-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16294.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add group protocol migration enabling config
> 
>
> Key: KAFKA-16294
> URL: https://issues.apache.org/jira/browse/KAFKA-16294
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>
> The online upgrade is triggered when a consumer group heartbeat request is 
> received in a classic group. The downgrade is triggered when any old protocol 
> request is received in a consumer group. We only accept upgrade/downgrade if 
> the corresponding group migration config policy is enabled.
> This is the first part of the implementation of online group protocol 
> migration, adding the kafka config group protocol migration. The config has 
> four valid values – both(both upgrade and downgrade are allowed), 
> upgrade(only upgrade is allowed), downgrade(only downgrade is allowed) and 
> none(neither is allowed.).
> At present the default value is NONE. When we start enabling the migration, 
> we expect to set BOTH to default so that it's easier to roll back to the old 
> protocol as a quick fix for anything wrong in the new protocol; when using 
> consumer groups becomes default and the migration is near finished, we will 
> set the default policy to UPGRADE to prevent unwanted downgrade causing too 
> frequent migration. DOWNGRADE could be useful for revert or debug purposes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: the migration of command tools

2024-04-10 Thread David Jacot
Hey,

I think that we discussed this in this KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-906%3A+Tools+migration+guidelines.
I don't remember all the details though.

Best,
David

On Wed, Apr 10, 2024 at 2:54 PM Chia-Ping Tsai  wrote:

> Dear Kafka,
>
> Migrating command tools from core module to tools module is not news.
> However, I want to make sure I don't misunderstand the BC rules.
>
> The question is "Should we keep origin class?"
>
> FeatureCommand (
>
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/FeatureCommand.scala
> )
> is a good example. We keep the origin class file due to backward
> compatibility. However, we don't do that to other tools.
>
> It seems to me that we should align the BC rules for all tools. And here is
> my two cents: the expected way of using command tool is by script file, so
> we DON'T need to keep origin class file.
>
> WDYT?
>
> Best,
> Chia-Ping
>


[jira] [Created] (KAFKA-16503) getOrMaybeCreateClassicGroup should not thrown GroupIdNotFoundException

2024-04-10 Thread David Jacot (Jira)
David Jacot created KAFKA-16503:
---

 Summary: getOrMaybeCreateClassicGroup should not thrown 
GroupIdNotFoundException
 Key: KAFKA-16503
 URL: https://issues.apache.org/jira/browse/KAFKA-16503
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot


It looks like `getOrMaybeCreateClassicGroup` method throws an 
`GroupIdNotFoundException` error when the group exists but with the wrong type. 
As `getOrMaybeCreateClassicGroup` is mainly used on the join-group/sync-group 
APIs, this seems incorrect. We need to double check and fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1022 Formatting and Updating Features

2024-04-10 Thread David Jacot
+1 (binding). Thanks for the KIP!

On Mon, Apr 8, 2024 at 7:23 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi Justine,
> Thanks for the KIP.
>
> +1 (non-binding)
>
> Thanks,
> Andrew
>
> > On 8 Apr 2024, at 18:07, Justine Olshan 
> wrote:
> >
> > Hello all,
> > I would like to start a vote for KIP-1022 Formatting and Updating
> Features
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1022%3A+Formatting+and+Updating+Features
> >
> >
> > Please take a look and cast your vote.
> >
> > Thanks,
> > Justine
>
>


Re: Gentle bump on KAFKA-16371 (Unstable committed offsets after triggering commits where metadata for some partitions are over the limit)

2024-04-05 Thread David Jacot
Thanks, Michal. Let me add it to my review queue.

BR,
David

On Fri, Apr 5, 2024 at 3:29 PM Michał Łowicki  wrote:

> Hi there!
>
> Created https://issues.apache.org/jira/browse/KAFKA-16371 few weeks back
> but there wasn't any attention. Any chance someone knowing that code could
> take a look at the issue found and proposed fixed? Thanks in advance.
>
> --
> BR,
> Michał Łowicki
>


[jira] [Created] (KAFKA-16470) kafka-dump-log --offsets-decoder should support new records

2024-04-04 Thread David Jacot (Jira)
David Jacot created KAFKA-16470:
---

 Summary: kafka-dump-log --offsets-decoder should support new 
records
 Key: KAFKA-16470
 URL: https://issues.apache.org/jira/browse/KAFKA-16470
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-04-03 Thread David Jacot
; > On Mon, Apr 1, 2024 at 2:06 PM Justine Olshan
> >> > > >  >> > > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I have also updated the KIP to mention the feature tool's
> >> > --metadata
> >> > > > flag
> >> > > > > > will be deprecated.
> >> > > > > > It will still work for users as they learn the new flag, but a
> >> > > warning
> >> > > > > > indicating the alternatives will be shown.
> >> > > > > >
> >> > > > > > Justine
> >> > > > > >
> >> > > > > > On Thu, Mar 28, 2024 at 11:03 AM Justine Olshan <
> >> > > jols...@confluent.io>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Jun,
> >> > > > > > >
> >> > > > > > > For both transaction state and group coordinator state,
> there
> >> are
> >> > > > only
> >> > > > > > > version 0 records.
> >> > > > > > > KIP-915 introduced flexible versions, but it was never put
> to
> >> > use.
> >> > > MV
> >> > > > > has
> >> > > > > > > never gated these. This KIP will do that. I can include this
> >> > > context
> >> > > > in
> >> > > > > > the
> >> > > > > > > KIP.
> >> > > > > > >
> >> > > > > > > I'm happy to modify his 1 and 2 to 0 and 1.
> >> > > > > > >
> >> > > > > > > Justine
> >> > > > > > >
> >> > > > > > > On Thu, Mar 28, 2024 at 10:57 AM Jun Rao
> >> >  >> > > >
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Hi, David,
> >> > > > > > >>
> >> > > > > > >> Thanks for the reply.
> >> > > > > > >>
> >> > > > > > >> Historically, the format of all records were controlled by
> >> MV.
> >> > > Now,
> >> > > > > > >> records
> >> > > > > > >> in _offset_commit will be controlled by
> >> > > `group.coordinator.version`,
> >> > > > > is
> >> > > > > > >> that right? It would be useful to document that.
> >> > > > > > >>
> >> > > > > > >> Also, we should align on the version numbering.
> >> "kafka-feature
> >> > > > > disable"
> >> > > > > > >> says "Disable one or more feature flags. This is the same
> as
> >> > > > > downgrading
> >> > > > > > >> the version to zero". So, in the
> `group.coordinator.version'
> >> > case,
> >> > > > we
> >> > > > > > >> probably should use version 0 for the old consumer
> protocol.
> >> > > > > > >>
> >> > > > > > >> Jun
> >> > > > > > >>
> >> > > > > > >> On Thu, Mar 28, 2024 at 2:13 AM Andrew Schofield <
> >> > > > > > >> andrew_schofield_j...@outlook.com> wrote:
> >> > > > > > >>
> >> > > > > > >> > Hi David,
> >> > > > > > >> > I agree that we should use the same mechanism to gate
> >> KIP-932
> >> > > once
> >> > > > > > that
> >> > > > > > >> > feature reaches production readiness. The precise details
> >> of
> >> > the
> >> > > > > > values
> >> > > > > > >> > will
> >> > > > > > >> > depend upon the current state of all these flags when
> that
> >> > > release
> >> > > > > > >> comes.
> >> > > > > > >> >
> >> > > > > > >> > Thanks,
> >> > > > > > >> > Andrew
> >> > > > > > >> 

[jira] [Resolved] (KAFKA-16148) Implement GroupMetadataManager#onUnloaded

2024-04-02 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16148.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Implement GroupMetadataManager#onUnloaded
> -
>
> Key: KAFKA-16148
> URL: https://issues.apache.org/jira/browse/KAFKA-16148
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> complete all awaiting futures with NOT_COORDINATOR (for classic group)
> transition all groups to DEAD.
> Cancel all timers related to the unloaded group metadata manager



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-04-02 Thread David Jacot
Hi Justine,

Thanks for the KIP. This will be very helpful!

I do have one question regarding the naming of the new flags which is not
totally clear in the KIP. It would be great if we could call them out in
the Public Interfaces section.

My understanding is that the KIP proposes to use
`transaction.protocol.version` and `group.coordinator.version`. I was
wondering whether we should just use `transaction.version` and
`group.version`. The rationale for the first one is that a new version may
not always be for a protocol change. The rationale for the second one is
that it gates more than the group coordinator as we may use it for queues
too. It would also be aligned with `metadata.version`. I apologize if this
was already discussed.

Best,
David


On Tue, Apr 2, 2024 at 11:18 AM David Jacot  wrote:

> Hi Jun,
>
> > Historically, the format of all records were controlled by MV. Now,
> records
> in _offset_commit will be controlled by `group.coordinator.version`, is
> that right? It would be useful to document that.
>
> Yes. This is correct. The idea is to replace the MV with this new flag. It
> will have the same semantics but with the benefit of being independent.
>
> > Also, we should align on the version numbering. "kafka-feature disable"
> says "Disable one or more feature flags. This is the same as downgrading
> the version to zero". So, in the `group.coordinator.version' case, we
> probably should use version 0 for the old consumer protocol.
>
> This makes sense. We can definitely use version 0.
>
> Best,
> David
>
> On Tue, Apr 2, 2024 at 1:43 AM Justine Olshan 
> wrote:
>
>> Hi Jun,
>>
>> 20. I can update the KIP.
>>
>> 21. This is used to complete some of the work with KIP-360. (We use
>> previous producer ID there, but never persisted it which was in the KIP
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820
>> )
>> The KIP also mentions including previous epoch but we explained in this
>> KIP
>> how we can figure this out.
>>
>> Justine
>>
>>
>>
>> On Mon, Apr 1, 2024 at 3:56 PM Jun Rao  wrote:
>>
>> > Hi, Justine,
>> >
>> > Thanks for the updated KIP. A couple of more comments.
>> >
>> > 20. Could we show the output of version-mapping?
>> >
>> > 21. "Transaction version 1 will include the flexible fields in the
>> > transaction state log, and transaction version 2 will include the
>> changes
>> > to the transactional protocol as described by KIP-890 (epoch bumps and
>> > implicit add partitions.)"
>> >   So TV 1 enables the writing of new tagged fields like PrevProducerId?
>> But
>> > those fields are only usable after the epoch bump, right? What
>> > functionality does TV 1 achieve?
>> >
>> > Jun
>> >
>> >
>> > On Mon, Apr 1, 2024 at 2:06 PM Justine Olshan
>> > > >
>> > wrote:
>> >
>> > > I have also updated the KIP to mention the feature tool's --metadata
>> flag
>> > > will be deprecated.
>> > > It will still work for users as they learn the new flag, but a warning
>> > > indicating the alternatives will be shown.
>> > >
>> > > Justine
>> > >
>> > > On Thu, Mar 28, 2024 at 11:03 AM Justine Olshan > >
>> > > wrote:
>> > >
>> > > > Hi Jun,
>> > > >
>> > > > For both transaction state and group coordinator state, there are
>> only
>> > > > version 0 records.
>> > > > KIP-915 introduced flexible versions, but it was never put to use.
>> MV
>> > has
>> > > > never gated these. This KIP will do that. I can include this
>> context in
>> > > the
>> > > > KIP.
>> > > >
>> > > > I'm happy to modify his 1 and 2 to 0 and 1.
>> > > >
>> > > > Justine
>> > > >
>> > > > On Thu, Mar 28, 2024 at 10:57 AM Jun Rao 
>> > > wrote:
>> > > >
>> > > >> Hi, David,
>> > > >>
>> > > >> Thanks for the reply.
>> > > >>
>> > > >> Historically, the format of all records were controlled by MV. Now,
>> > > >> records
>> > > >> in _offset_commit will be controlled by
>> `group.coordinator.version`,
>> > is
>> > > >> that right? It would be useful to document that.
>> > > >>
>> > > >> Also, we should a

Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-04-02 Thread David Jacot
Hi Jun,

> Historically, the format of all records were controlled by MV. Now,
records
in _offset_commit will be controlled by `group.coordinator.version`, is
that right? It would be useful to document that.

Yes. This is correct. The idea is to replace the MV with this new flag. It
will have the same semantics but with the benefit of being independent.

> Also, we should align on the version numbering. "kafka-feature disable"
says "Disable one or more feature flags. This is the same as downgrading
the version to zero". So, in the `group.coordinator.version' case, we
probably should use version 0 for the old consumer protocol.

This makes sense. We can definitely use version 0.

Best,
David

On Tue, Apr 2, 2024 at 1:43 AM Justine Olshan 
wrote:

> Hi Jun,
>
> 20. I can update the KIP.
>
> 21. This is used to complete some of the work with KIP-360. (We use
> previous producer ID there, but never persisted it which was in the KIP
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820)
> The KIP also mentions including previous epoch but we explained in this KIP
> how we can figure this out.
>
> Justine
>
>
>
> On Mon, Apr 1, 2024 at 3:56 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the updated KIP. A couple of more comments.
> >
> > 20. Could we show the output of version-mapping?
> >
> > 21. "Transaction version 1 will include the flexible fields in the
> > transaction state log, and transaction version 2 will include the changes
> > to the transactional protocol as described by KIP-890 (epoch bumps and
> > implicit add partitions.)"
> >   So TV 1 enables the writing of new tagged fields like PrevProducerId?
> But
> > those fields are only usable after the epoch bump, right? What
> > functionality does TV 1 achieve?
> >
> > Jun
> >
> >
> > On Mon, Apr 1, 2024 at 2:06 PM Justine Olshan
>  > >
> > wrote:
> >
> > > I have also updated the KIP to mention the feature tool's --metadata
> flag
> > > will be deprecated.
> > > It will still work for users as they learn the new flag, but a warning
> > > indicating the alternatives will be shown.
> > >
> > > Justine
> > >
> > > On Thu, Mar 28, 2024 at 11:03 AM Justine Olshan 
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > For both transaction state and group coordinator state, there are
> only
> > > > version 0 records.
> > > > KIP-915 introduced flexible versions, but it was never put to use. MV
> > has
> > > > never gated these. This KIP will do that. I can include this context
> in
> > > the
> > > > KIP.
> > > >
> > > > I'm happy to modify his 1 and 2 to 0 and 1.
> > > >
> > > > Justine
> > > >
> > > > On Thu, Mar 28, 2024 at 10:57 AM Jun Rao 
> > > wrote:
> > > >
> > > >> Hi, David,
> > > >>
> > > >> Thanks for the reply.
> > > >>
> > > >> Historically, the format of all records were controlled by MV. Now,
> > > >> records
> > > >> in _offset_commit will be controlled by `group.coordinator.version`,
> > is
> > > >> that right? It would be useful to document that.
> > > >>
> > > >> Also, we should align on the version numbering. "kafka-feature
> > disable"
> > > >> says "Disable one or more feature flags. This is the same as
> > downgrading
> > > >> the version to zero". So, in the `group.coordinator.version' case,
> we
> > > >> probably should use version 0 for the old consumer protocol.
> > > >>
> > > >> Jun
> > > >>
> > > >> On Thu, Mar 28, 2024 at 2:13 AM Andrew Schofield <
> > > >> andrew_schofield_j...@outlook.com> wrote:
> > > >>
> > > >> > Hi David,
> > > >> > I agree that we should use the same mechanism to gate KIP-932 once
> > > that
> > > >> > feature reaches production readiness. The precise details of the
> > > values
> > > >> > will
> > > >> > depend upon the current state of all these flags when that release
> > > >> comes.
> > > >> >
> > > >> > Thanks,
> > > >> > Andrew
> > > >> >
> > > >> > > On 28 Mar 2024, at 07:11, David Jacot
>  > >
> > > >> >

Re: [DISCUSS] KIP-1022 Formatting and Updating Features

2024-03-28 Thread David Jacot
Hi, Jun, Justine,

Regarding `group.coordinator.version`, the idea is to use it to gate
records and APIs of the group coordinator. The first use case will be
KIP-848. We will use version 2 of the flag to gate all the new records and
the new ConsumerGroupHeartbeat/Describe APIs present in AK 3.8. So version
1 will be the only the old protocol and version 2 will be the currently
implemented new protocol. I don't think that we have any dependency on the
metadata version at the moment. The changes are orthogonal. I think that we
could mention KIP-848 as the first usage of this flag in the KIP. I will
also update KIP-848 to include it when this KIP is accepted. Another use
case is the Queues KIP. I think that we should also use this new flag to
gate it.

Best,
David

On Thu, Mar 28, 2024 at 1:14 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> So, "dependencies" and "version-mapping" will be added to both
> kafka-feature and kafka-storage? Could we document that in the tool format
> section?
>
> Jun
>
> On Wed, Mar 27, 2024 at 4:01 PM Justine Olshan
> 
> wrote:
>
> > Ok. I can remove the info from the describe output.
> >
> > Dependencies is needed for the storage tool because we want to make sure
> > the desired versions we are setting will be valid. Version mapping should
> > be for both tools since we have --release-version for both tools.
> >
> > I was considering changing the IV strings, but I wasn't sure if there
> would
> > be some disagreement with the decision. Not sure if that breaks
> > compatibility etc. Happy to hear everyone's thoughts.
> >
> > Justine
> >
> > On Wed, Mar 27, 2024 at 3:36 PM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply.
> > >
> > > Having "kafka-feature dependencies" seems enough to me. We don't need
> to
> > > include the dependencies in the output of "kafka-feature describe".
> > >
> > > We only support "dependencies" in kafka-feature, not kafka-storage. We
> > > probably should do the same for "version-mapping".
> > >
> > > bin/kafka-features.sh downgrade --feature metadata.version=16
> > > --transaction.protocol.version=2
> > > We need to add the --feature flag for the second feature, right?
> > >
> > > In "kafka-features.sh describe", we only show the IV string for
> > > metadata.version. Should we also show the level number?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Mar 27, 2024 at 1:52 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > I had already included this example
> > > > bin/kafka-features.sh downgrade --feature metadata.version=16
> > > > --transaction.protocol.version=2 // Throws error if metadata version
> > is <
> > > > 16, and this would be an upgrade
> > > > But I have updated the KIP to explicitly say the text you mentioned.
> > > >
> > > > Justine
> > > >
> > > > On Wed, Mar 27, 2024 at 1:41 PM José Armando García Sancio
> > > >  wrote:
> > > >
> > > > > Hi Justine,
> > > > >
> > > > > See my comment below.
> > > > >
> > > > > On Wed, Mar 27, 2024 at 1:31 PM Justine Olshan
> > > > >  wrote:
> > > > > > The feature command includes the upgrade or downgrade command
> along
> > > > with
> > > > > > the --release-version flag. If some features are not moving in
> the
> > > > > > direction mentioned (upgrade or downgrade) the command will fail
> --
> > > > > perhaps
> > > > > > with an error of which features were going in the wrong
> direction.
> > > > >
> > > > > How about updating the KIP to show and document this behavior?
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > -José
> > > > >
> > > >
> > >
> >
>


[jira] [Resolved] (KAFKA-16353) Offline protocol migration integration tests

2024-03-27 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16353.
-
Resolution: Fixed

> Offline protocol migration integration tests
> 
>
> Key: KAFKA-16353
> URL: https://issues.apache.org/jira/browse/KAFKA-16353
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16374) High watermark updates should have a higher priority

2024-03-25 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16374.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> High watermark updates should have a higher priority
> 
>
> Key: KAFKA-16374
> URL: https://issues.apache.org/jira/browse/KAFKA-16374
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15989) Upgrade existing generic group to consumer group

2024-03-20 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15989.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Upgrade existing generic group to consumer group
> 
>
> Key: KAFKA-15989
> URL: https://issues.apache.org/jira/browse/KAFKA-15989
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Emanuele Sabellico
>    Assignee: David Jacot
>Priority: Minor
> Fix For: 3.8.0
>
>
> It should be possible to upgrade an existing generic group to a new consumer 
> group, in case it was using either the previous generic protocol or manual 
> partition assignment and commit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15763) Group Coordinator should not deliver new assignment before previous one is acknowledged

2024-03-20 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15763.
-
Resolution: Won't Fix

We went with another approach.

> Group Coordinator should not deliver new assignment before previous one is 
> acknowledged
> ---
>
> Key: KAFKA-15763
> URL: https://issues.apache.org/jira/browse/KAFKA-15763
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
>
> In the initial implementation of the new consumer group protocol, the group 
> coordinators waits on received an acknowledgement from the consumer only when 
> there are partitions to be revoked. In the case of newly assigned partitions, 
> a new assignment can be delivered any time (e.g. in two subsequent 
> heartbeats).
> While implementing the state machine on the client side, we found out that 
> this caused confusion because the protocol does not treat revocation and 
> assignment in the same way. We also found out that changing the assignment 
> before the previous one is fully processed by the member makes the client 
> side logic more complicated than it should be because the consumer can't 
> process any new assignment until it has completed the previous one.
> In the end, it is better to change the server side to not deliver a new 
> assignment before the current one is acknowledged by the consumer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16313) Offline group protocol migration

2024-03-20 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16313.
-
Fix Version/s: 3.8.0
 Assignee: Dongnuo Lyu
   Resolution: Fixed

> Offline group protocol migration
> 
>
> Key: KAFKA-16313
> URL: https://issues.apache.org/jira/browse/KAFKA-16313
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16367) Full ConsumerGroupHeartbeat response must be sent when full request is received

2024-03-19 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16367.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Full ConsumerGroupHeartbeat response must be sent when full request is 
> received
> ---
>
> Key: KAFKA-16367
> URL: https://issues.apache.org/jira/browse/KAFKA-16367
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16374) High watermark updates should have a higher priority

2024-03-14 Thread David Jacot (Jira)
David Jacot created KAFKA-16374:
---

 Summary: High watermark updates should have a higher priority
 Key: KAFKA-16374
 URL: https://issues.apache.org/jira/browse/KAFKA-16374
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15997) Ensure fairness in the uniform assignor

2024-03-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15997.
-
Resolution: Fixed

This issue got resolved by https://issues.apache.org/jira/browse/KAFKA-16249.

> Ensure fairness in the uniform assignor
> ---
>
> Key: KAFKA-15997
> URL: https://issues.apache.org/jira/browse/KAFKA-15997
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Emanuele Sabellico
>    Assignee: David Jacot
>Priority: Minor
>
>  
>  
> Fairness has to be ensured in uniform assignor as it was in 
> cooperative-sticky one.
> There's this test 0113 subtest u_multiple_subscription_changes in librdkafka 
> where 8 consumers are subscribing to the same topic, and it's verifying that 
> all of them are getting 2 partitions assigned. But with new protocol it seems 
> two consumers get assigned 3 partitions and 1 has zero partitions. The test 
> doesn't configure any client.rack.
> {code:java}
> [0113_cooperative_rebalance  /478.183s] Consumer assignments 
> (subscription_variation 0) (stabilized) (no rebalance cb):
> [0113_cooperative_rebalance  /478.183s] Consumer C_0#consumer-3 assignment 
> (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [5] (2000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [8] (4000msgs)
> [0113_cooperative_rebalance  /478.183s] Consumer C_1#consumer-4 assignment 
> (3): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [0] (1000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [3] (2000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [13] (1000msgs)
> [0113_cooperative_rebalance  /478.184s] Consumer C_2#consumer-5 assignment 
> (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [6] (1000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [10] (2000msgs)
> [0113_cooperative_rebalance  /478.184s] Consumer C_3#consumer-6 assignment 
> (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [7] (1000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [9] (2000msgs)
> [0113_cooperative_rebalance  /478.184s] Consumer C_4#consumer-7 assignment 
> (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [11] (1000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [14] (3000msgs)
> [0113_cooperative_rebalance  /478.184s] Consumer C_5#consumer-8 assignment 
> (3): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [1] (2000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [2] (2000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [4] (1000msgs)
> [0113_cooperative_rebalance  /478.184s] Consumer C_6#consumer-9 assignment 
> (0): 
> [0113_cooperative_rebalance  /478.184s] Consumer C_7#consumer-10 assignment 
> (2): rdkafkatest_rnd24419cc75e59d8de_0113u_1 [12] (1000msgs), 
> rdkafkatest_rnd24419cc75e59d8de_0113u_1 [15] (1000msgs)
> [0113_cooperative_rebalance  /478.184s] 16/32 partitions assigned
> [0113_cooperative_rebalance  /478.184s] Consumer C_0#consumer-3 has 2 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_1#consumer-4 has 3 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_2#consumer-5 has 2 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_3#consumer-6 has 2 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_4#consumer-7 has 2 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_5#consumer-8 has 3 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_6#consumer-9 has 0 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [0113_cooperative_rebalance  /478.184s] Consumer C_7#consumer-10 has 2 
> assigned partitions (1 subscribed topic(s)), expecting 2 assigned partitions
> [                      /479.057s] 1 test(s) running: 
> 0113_cooperative_rebalance
> [                      /480.057s] 1 test(s) running: 
> 0113_cooperative_rebalance
> [                      /481.057s] 1 test(s) running: 
> 0113_cooperative_rebalance
> [0113_cooperative_rebalance  /482.498s] TEST FAILURE
> ### Test "0113_cooperative_rebalance (u_multiple_subscription_changes:2390: 
> use_rebalance_cb: 0, subscription_variation: 0)" failed at 
> test.c:1243:check_test_timeouts() at Thu Dec  7 15:52:15 2023: ###
> Test 0113_cooperative_rebalance (u_multiple_subscription_changes:2390: 
> use

[jira] [Resolved] (KAFKA-16249) Improve reconciliation state machine

2024-03-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16249.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Improve reconciliation state machine
> 
>
> Key: KAFKA-16249
> URL: https://issues.apache.org/jira/browse/KAFKA-16249
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16367) Full ConsumerGroupHeartbeat response must be sent when full request is received

2024-03-12 Thread David Jacot (Jira)
David Jacot created KAFKA-16367:
---

 Summary: Full ConsumerGroupHeartbeat response must be sent when 
full request is received
 Key: KAFKA-16367
 URL: https://issues.apache.org/jira/browse/KAFKA-16367
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15462) Add group type filter to the admin client

2024-02-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15462.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add group type filter to the admin client
> -
>
> Key: KAFKA-15462
> URL: https://issues.apache.org/jira/browse/KAFKA-15462
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16306) GroupCoordinatorService logger is not configured

2024-02-27 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16306.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> GroupCoordinatorService logger is not configured
> 
>
> Key: KAFKA-16306
> URL: https://issues.apache.org/jira/browse/KAFKA-16306
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Minor
> Fix For: 3.8.0
>
>
> The GroupCoordinatorService constructor initializes with the wrong logger 
> class:
> ```
> GroupCoordinatorService(
> LogContext logContext,
> GroupCoordinatorConfig config,
> CoordinatorRuntime runtime,
> GroupCoordinatorMetrics groupCoordinatorMetrics
> ) {
>     this.log = logContext.logger(CoordinatorLoader.class);
>     this.config = config;
>     this.runtime = runtime;
>     this.groupCoordinatorMetrics = groupCoordinatorMetrics;
> }
> ```
> change this to GroupCoordinatorService.class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16249) Improve reconciliation state machine

2024-02-13 Thread David Jacot (Jira)
David Jacot created KAFKA-16249:
---

 Summary: Improve reconciliation state machine
 Key: KAFKA-16249
 URL: https://issues.apache.org/jira/browse/KAFKA-16249
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread David Jacot
Hi Bruno,

Yes, you're right. Sorry for the typo.

Hi Ismael,

You're right. Jenkins does not support the flakyFailure element and
hence the information is not at all in the Jenkins report. I am still
experimenting with printing the flaky tests somewhere. I will update this
thread if I get something working. In the meantime, I wanted to gauge
whether there is support for it.

Cheers,
David

On Mon, Feb 12, 2024 at 3:59 PM Ismael Juma  wrote:

> Hi David,
>
> Your message didn't make this clear, but you are saying that Jenkins does
> _not_ support the flakyFailure element and hence this information will be
> completely missing from the Jenkins report. Have we considered including
> the flakyFailure information ourselves? I have seen that being done and it
> seems strictly better than totally ignoring it.
>
> Ismael
>
> On Mon, Feb 12, 2024 at 12:11 AM David Jacot 
> wrote:
>
> > Hi folks,
> >
> > I have been playing with `reports.junitXml.mergeReruns` setting in gradle
> > [1]. From the gradle doc:
> >
> > > When mergeReruns is enabled, if a test fails but is then retried and
> > succeeds, its failures will be recorded as  instead of
> > , within one . This is effectively the reporting
> > produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> > your CI server understands this format, it will indicate that the test
> was
> > flaky. If it does not, it will indicate that the test succeeded as it
> will
> > ignore the  information. If the test does not succeed (i.e.
> > it fails for every retry), it will be indicated as having failed whether
> > your tool understands this format or not.
> >
> > With this, we get really close to having green builds [2] all the time.
> > There are only a few tests which are too flaky. We should address or
> > disable those.
> >
> > I think that this would help us a lot because it would reduce the noise
> > that we get in pull requests. At the moment, there are just too many
> failed
> > tests reported so it is really hard to know whether a pull request is
> > actually fine or not.
> >
> > [1] applies it to both unit and integration tests. Following the
> discussion
> > in the `github build queue` thread, it may be better to only apply it to
> > the integration tests. Being stricter with unit tests would make sense.
> >
> > This does not mean that we should continue our effort to reduce the
> number
> > of flaky tests. For this, I propose to keep using Gradle Entreprise. It
> > provides a nice report for them that we can leverage.
> >
> > Thoughts?
> >
> > Best,
> > David
> >
> > [1] https://github.com/apache/kafka/pull/14862
> > [2]
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests
> >
>


[jira] [Created] (KAFKA-16244) Move code style exceptions from suppressions.xml to the code

2024-02-12 Thread David Jacot (Jira)
David Jacot created KAFKA-16244:
---

 Summary: Move code style exceptions from suppressions.xml to the 
code
 Key: KAFKA-16244
 URL: https://issues.apache.org/jira/browse/KAFKA-16244
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Improve flaky test reporting (KAFKA-12216)

2024-02-12 Thread David Jacot
Hi folks,

I have been playing with `reports.junitXml.mergeReruns` setting in gradle
[1]. From the gradle doc:

> When mergeReruns is enabled, if a test fails but is then retried and
succeeds, its failures will be recorded as  instead of
, within one . This is effectively the reporting
produced by the surefire plugin of Apache Maven™ when enabling reruns. If
your CI server understands this format, it will indicate that the test was
flaky. If it does not, it will indicate that the test succeeded as it will
ignore the  information. If the test does not succeed (i.e.
it fails for every retry), it will be indicated as having failed whether
your tool understands this format or not.

With this, we get really close to having green builds [2] all the time.
There are only a few tests which are too flaky. We should address or
disable those.

I think that this would help us a lot because it would reduce the noise
that we get in pull requests. At the moment, there are just too many failed
tests reported so it is really hard to know whether a pull request is
actually fine or not.

[1] applies it to both unit and integration tests. Following the discussion
in the `github build queue` thread, it may be better to only apply it to
the integration tests. Being stricter with unit tests would make sense.

This does not mean that we should continue our effort to reduce the number
of flaky tests. For this, I propose to keep using Gradle Entreprise. It
provides a nice report for them that we can leverage.

Thoughts?

Best,
David

[1] https://github.com/apache/kafka/pull/14862
[2]
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14862/19/tests


[jira] [Resolved] (KAFKA-16178) AsyncKafkaConsumer doesn't retry joining the group after rediscovering group coordinator

2024-02-11 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16178.
-
Resolution: Fixed

> AsyncKafkaConsumer doesn't retry joining the group after rediscovering group 
> coordinator
> 
>
> Key: KAFKA-16178
> URL: https://issues.apache.org/jira/browse/KAFKA-16178
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Dongnuo Lyu
>Assignee: Lianet Magrans
>Priority: Blocker
>  Labels: client-transitions-issues, consumer-threading-refactor
> Fix For: 3.8.0
>
> Attachments: pkc-devc63jwnj_jan19_0_debug
>
>
> {code:java}
> [2024-01-17 21:34:59,500] INFO [Consumer 
> clientId=consumer.7e26597f-0285-4e13-88d6-31500a500275-0, 
> groupId=consumer-groups-test-0] Discovered group coordinator 
> Coordinator(key='consumer-groups-test-0', nodeId=3, 
> host='b3-pkc-devc63jwnj.us-west-2.aws.devel.cpdev.cloud', port=9092, 
> errorCode=0, errorMessage='') 
> (org.apache.kafka.clients.consumer.internals.CoordinatorRequestManager:162)
> [2024-01-17 21:34:59,681] INFO [Consumer 
> clientId=consumer.7e26597f-0285-4e13-88d6-31500a500275-0, 
> groupId=consumer-groups-test-0] GroupHeartbeatRequest failed because the 
> group coordinator 
> Optional[b3-pkc-devc63jwnj.us-west-2.aws.devel.cpdev.cloud:9092 (id: 
> 2147483644 rack: null)] is incorrect. Will attempt to find the coordinator 
> again and retry in 0ms: This is not the correct coordinator. 
> (org.apache.kafka.clients.consumer.internals.HeartbeatRequestManager:407)
> [2024-01-17 21:34:59,681] INFO [Consumer 
> clientId=consumer.7e26597f-0285-4e13-88d6-31500a500275-0, 
> groupId=consumer-groups-test-0] Group coordinator 
> b3-pkc-devc63jwnj.us-west-2.aws.devel.cpdev.cloud:9092 (id: 2147483644 rack: 
> null) is unavailable or invalid due to cause: This is not the correct 
> coordinator.. Rediscovery will be attempted. 
> (org.apache.kafka.clients.consumer.internals.CoordinatorRequestManager:136)
> [2024-01-17 21:34:59,882] INFO [Consumer 
> clientId=consumer.7e26597f-0285-4e13-88d6-31500a500275-0, 
> groupId=consumer-groups-test-0] Discovered group coordinator 
> Coordinator(key='consumer-groups-test-0', nodeId=3, 
> host='b3-pkc-devc63jwnj.us-west-2.aws.devel.cpdev.cloud', port=9092, 
> errorCode=0, errorMessage='') 
> (org.apache.kafka.clients.consumer.internals.CoordinatorRequestManager:162){code}
> Some of the consumers don't consume any message. The logs show that after the 
> consumer starts up and successfully logs in,
>  # The consumer discovers the group coordinator.
>  # The heartbeat to join group fails because "This is not the correct 
> coordinator"
>  # The consumer rediscover the group coordinator.
> Another heartbeat should follow the rediscovery of the group coordinator, but 
> there's no logs showing sign of a heartbeat request. 
> On the server side, there is completely no log about the group id. A 
> suspicion is that the consumer doesn't send a heartbeat request after 
> rediscover the group coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-951: Leader discovery optimisations for the client

2024-02-06 Thread David Jacot
Hi,

Thanks for bringing this up. It may be worth bringing it in the 3.7 release
thread too as it may qualify as a blocker for the release.

Best,
David

On Tue, Feb 6, 2024 at 5:09 PM Mayank Shekhar Narula <
mayanks.nar...@gmail.com> wrote:

> Hi Folks
>
> KIP-951 was delivered fully in AK 3.7. Its 1st optimisation was delivered
> in 3.6.1, to skip backoff period for a produce batch being retried to new
> leader i.e. KAFKA-15415.
>
> KAFKA-15415 current implementation introduced a performance regression, by
> increasing synchronization on the produce path, especially for high
> partition counts. The description section of
> https://issues.apache.org/jira/browse/KAFKA-16226 goes more into details
> of
> the regression.
>
> I have put up a fix https://github.com/apache/kafka/pull/15323, which
> removes this synchronization. The fix adds a new public method to
> Cluster.java, and a public constructor to PartitionInfo.java. Let me know
> your thoughts.
>
> On Wed, Oct 4, 2023 at 10:09 AM Mayank Shekhar Narula <
> mayanks.nar...@gmail.com> wrote:
>
> > Summarising, there are 5 binding votes(Luke, Jose, Jun, David, Jason),
> and
> > 1 non-binding vote(Kirk).
> >
> > With the current status of voting, KIP is accepted.
> >
> > Thanks again to all reviewers and voters.
> >
> >
> >
> > On Wed, Oct 4, 2023 at 9:37 AM Mayank Shekhar Narula <
> > mayanks.nar...@gmail.com> wrote:
> >
> >> Thank you all for your votes, Jun, David, and Jason!
> >>
> >> On Tue, Oct 3, 2023 at 11:44 PM Jason Gustafson
> >>  wrote:
> >>
> >>> +1 Thanks for the KIP
> >>>
> >>> On Tue, Oct 3, 2023 at 12:30 PM David Jacot 
> >>> wrote:
> >>>
> >>> > Thanks for the KIP. +1 from me as well.
> >>> >
> >>> > Best,
> >>> > David
> >>> >
> >>> > Le mar. 3 oct. 2023 à 20:54, Jun Rao  a
> >>> écrit :
> >>> >
> >>> > > Hi, Mayank,
> >>> > >
> >>> > > Thanks for the detailed explanation in the KIP. +1 from me.
> >>> > >
> >>> > > Jun
> >>> > >
> >>> > > On Wed, Sep 27, 2023 at 4:39 AM Mayank Shekhar Narula <
> >>> > > mayanks.nar...@gmail.com> wrote:
> >>> > >
> >>> > > > Reviving this thread, as the discussion thread has been updated.
> >>> > > >
> >>> > > > On Fri, Jul 28, 2023 at 11:29 AM Mayank Shekhar Narula <
> >>> > > > mayanks.nar...@gmail.com> wrote:
> >>> > > >
> >>> > > > > Thanks Jose.
> >>> > > > >
> >>> > > > > On Thu, Jul 27, 2023 at 5:46 PM José Armando García Sancio
> >>> > > > >  wrote:
> >>> > > > >
> >>> > > > >> The KIP LGTM. Thanks for the design. I am looking forward to
> the
> >>> > > > >> implementation.
> >>> > > > >>
> >>> > > > >> +1 (binding).
> >>> > > > >>
> >>> > > > >> Thanks!
> >>> > > > >> --
> >>> > > > >> -José
> >>> > > > >>
> >>> > > > >
> >>> > > > >
> >>> > > > > --
> >>> > > > > Regards,
> >>> > > > > Mayank Shekhar Narula
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > Regards,
> >>> > > > Mayank Shekhar Narula
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Mayank Shekhar Narula
> >>
> >
> >
> > --
> > Regards,
> > Mayank Shekhar Narula
> >
>
>
> --
> Regards,
> Mayank Shekhar Narula
>


[jira] [Created] (KAFKA-16227) Console consumer fails with `IllegalStateException`

2024-02-06 Thread David Jacot (Jira)
David Jacot created KAFKA-16227:
---

 Summary: Console consumer fails with `IllegalStateException`
 Key: KAFKA-16227
 URL: https://issues.apache.org/jira/browse/KAFKA-16227
 Project: Kafka
  Issue Type: Sub-task
  Components: clients
Reporter: David Jacot
Assignee: Kirk True


I have seen a few occurrences like the following one. There is a race between 
the background thread and the foreground thread. I imagine the following steps:
 * quickstart-events-2 is assigned by the background thread;
 * the foreground thread starts the initialization of the partition (e.g. reset 
offset);
 * quickstart-events-2 is removed by the background thread;
 * the initialization completes and quickstart-events-2 does not exist anymore.

 
{code:java}
[2024-02-06 16:21:57,375] ERROR Error processing message, terminating consumer 
process:  (kafka.tools.ConsoleConsumer$)
java.lang.IllegalStateException: No current assignment for partition 
quickstart-events-2
at 
org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:367)
at 
org.apache.kafka.clients.consumer.internals.SubscriptionState.updateHighWatermark(SubscriptionState.java:579)
at 
org.apache.kafka.clients.consumer.internals.FetchCollector.handleInitializeSuccess(FetchCollector.java:283)
at 
org.apache.kafka.clients.consumer.internals.FetchCollector.initialize(FetchCollector.java:226)
at 
org.apache.kafka.clients.consumer.internals.FetchCollector.collectFetch(FetchCollector.java:110)
at 
org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.collectFetch(AsyncKafkaConsumer.java:1540)
at 
org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.pollForFetches(AsyncKafkaConsumer.java:1525)
at 
org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer.poll(AsyncKafkaConsumer.java:711)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:874)
at 
kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:473)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15460) Add group type filter to ListGroups API

2024-02-05 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15460.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add group type filter to ListGroups API
> ---
>
> Key: KAFKA-15460
> URL: https://issues.apache.org/jira/browse/KAFKA-15460
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16189) Extend admin to support ConsumerGroupDescribe API

2024-02-01 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16189.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Extend admin to support ConsumerGroupDescribe API
> -
>
> Key: KAFKA-16189
> URL: https://issues.apache.org/jira/browse/KAFKA-16189
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16168) Implement GroupCoordinator.onPartitionsDeleted

2024-02-01 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16168.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Implement GroupCoordinator.onPartitionsDeleted
> --
>
> Key: KAFKA-16168
> URL: https://issues.apache.org/jira/browse/KAFKA-16168
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16095) Update list group state type filter to include the states for the new consumer group type

2024-01-29 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16095.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Update list group state type filter to include the states for the new 
> consumer group type
> -
>
> Key: KAFKA-16095
> URL: https://issues.apache.org/jira/browse/KAFKA-16095
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Lan Ding
>Priority: Minor
> Fix For: 3.8.0
>
>
> # While using *—list —state* the current accepted values correspond to the 
> classic group type states. We need to include support for the new group type 
> states.
>  ## Consumer Group: Should list the state of the group. Accepted Values: 
>  ### _UNKNOWN(“unknown”)_
>  ### {_}EMPTY{_}("empty"),
>  ### *{_}ASSIGNING{_}("assigning"),*
>  ### *{_}RECONCILING{_}("reconciling"),*
>  ### {_}STABLE{_}("stable"),
>  ### {_}DEAD{_}("dead");
>  # 
>  ## Classic Group : Should list the state of the group. Accepted Values: 
>  ### {_}UNKNOWN{_}("Unknown"),
>  ### {_}EMPTY{_}("Empty");
>  ### *{_}PREPARING_REBALANCE{_}("PreparingRebalance"),*
>  ### *{_}COMPLETING_REBALANCE{_}("CompletingRebalance"),*
>  ### {_}STABLE{_}("Stable"),
>  ### {_}DEAD{_}("Dead")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14505) Implement TnxOffsetCommit API

2024-01-26 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14505.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Implement TnxOffsetCommit API
> -
>
> Key: KAFKA-14505
> URL: https://issues.apache.org/jira/browse/KAFKA-14505
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.8.0
>
>
> Implement TnxOffsetCommit API in the new Group Coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16194) KafkaConsumer.groupMetadata() should be correct when first records are returned

2024-01-25 Thread David Jacot (Jira)
David Jacot created KAFKA-16194:
---

 Summary: KafkaConsumer.groupMetadata() should be correct when 
first records are returned
 Key: KAFKA-16194
 URL: https://issues.apache.org/jira/browse/KAFKA-16194
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot


The following code returns records before the group metadata is updated. This 
fails the first transactions ever run by the Producer/Consumer.

 
{code:java}
Producer txnProducer = new KafkaProducer<>(txnProducerProps);
Consumer consumer = new KafkaConsumer<>(consumerProps);

txnProducer.initTransactions();
System.out.println("Init transactions called");

try {
txnProducer.beginTransaction();
System.out.println("Begin transactions called");

consumer.subscribe(Collections.singletonList("input"));
System.out.println("Consumer subscribed to topic -> KIP848-topic-2 ");

ConsumerRecords records = 
consumer.poll(Duration.ofSeconds(10));
System.out.println("Returned " + records.count() + " records.");

// Process and send txn messages.
for (ConsumerRecord processedRecord : records) {
txnProducer.send(new ProducerRecord<>("output", processedRecord.key(), 
"Processed: " + processedRecord.value()));
}

ConsumerGroupMetadata groupMetadata = consumer.groupMetadata();
System.out.println("Group metadata inside test" + groupMetadata);

Map offsetsToCommit = new HashMap<>();
for (ConsumerRecord record : records) {
offsetsToCommit.put(new TopicPartition(record.topic(), 
record.partition()),
new OffsetAndMetadata(record.offset() + 1));
}
System.out.println("Offsets to commit" + offsetsToCommit);
// Send offsets to transaction with ConsumerGroupMetadata.
txnProducer.sendOffsetsToTransaction(offsetsToCommit, groupMetadata);
System.out.println("Send offsets to transaction done");

// Commit the transaction.
txnProducer.commitTransaction();
System.out.println("Commit transaction done");
} catch (ProducerFencedException | OutOfOrderSequenceException | 
AuthorizationException e) {
e.printStackTrace();
txnProducer.close();
} catch (KafkaException e) {
e.printStackTrace();
txnProducer.abortTransaction();
} finally {
txnProducer.close();
consumer.close();
} {code}
The issue seems to be that while it waits in `poll`, the event to update the 
group metadata is not processed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16107) Ensure consumer does not start fetching from added partitions until onPartitionsAssigned completes

2024-01-24 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16107.
-
  Reviewer: David Jacot
Resolution: Fixed

> Ensure consumer does not start fetching from added partitions until 
> onPartitionsAssigned completes
> --
>
> Key: KAFKA-16107
> URL: https://issues.apache.org/jira/browse/KAFKA-16107
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: kip-848-client-support
> Fix For: 3.8.0
>
>
> In the new consumer implementation, when new partitions are assigned, the 
> subscription state is updated and then the #onPartitionsAssigned triggered. 
> This sequence seems sensible but we need to ensure that no data is fetched 
> until the onPartitionsAssigned completes (where the user could be setting the 
> committed offsets it want to start fetching from).
> We should pause the partitions newly added partitions until 
> onPartitionsAssigned completes, similar to how it's done on revocation to 
> avoid positions getting ahead of the committed offsets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16189) Extend admin to support ConsumerGroupDescribe API

2024-01-24 Thread David Jacot (Jira)
David Jacot created KAFKA-16189:
---

 Summary: Extend admin to support ConsumerGroupDescribe API
 Key: KAFKA-16189
 URL: https://issues.apache.org/jira/browse/KAFKA-16189
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread David Jacot
Hi Chris, Ziming,

Thanks for the clarification. I am glad that it does not impact the tool.
It may be worth adding a note about it in the KIP to avoid the same
question in the future.

Otherwise, I am +1 (binding). Thanks for driving this!

Best,
David

On Tue, Jan 23, 2024 at 6:07 AM ziming deng 
wrote:

> Hello David,
>
> Thanks for reminding this, as Chirs explained, the tools I’m trying to
> update only support set/delete configs, and I’m just make a way for
> append/subtract configs in the future, so this would not be affected by
> KAFKA-10140, and it would be a little overkill to support append/subtract
> configs or solve KAFKA-10140 here, so let’s leave it right now, I'm happy
> to pick it after finishing this KIP.
>
> --,
> Ziming
>
> > On Jan 22, 2024, at 18:23, David Jacot 
> wrote:
> >
> > Hi Ziming,
> >
> > Thanks for driving this. I wanted to bring KAFKA-10140
> > <https://issues.apache.org/jira/browse/KAFKA-10140> to your attention.
> It
> > looks like the incremental API does not work for configuring plugins. I
> > think that we need to cover this in the KIP.
> >
> > Best,
> > David
> >
> > On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 22 Jan 2024, at 07:29, Federico Valeri 
> wrote:
> >>>
> >>> +1 (non binding)
> >>>
> >>> Thanks.
> >>>
> >>> On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> >>>>
> >>>> Hi Ziming,
> >>>>
> >>>> +1(binding) from me.
> >>>>
> >>>> Thanks.
> >>>> Luke
> >>>>
> >>>> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
> >>>> kamal.chandraprak...@gmail.com> wrote:
> >>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng <
> dengziming1...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello everyone,
> >>>>>> I'd like to initiate a vote for KIP-1011.
> >>>>>> This KIP is about replacing alterConfigs with
> incrementalAlterConfigs
> >>>>>> when updating broker configs using kafka-configs.sh, this is similar
> >> to
> >>>>>> what we have done in KIP-894.
> >>>>>>
> >>>>>> KIP link:
> >>>>>> KIP-1011: Use incrementalAlterConfigs when updating broker configs
> by
> >>>>>> kafka-configs.sh - Apache Kafka - Apache Software Foundation
> >>>>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >>>>>> cwiki.apache.org
> >>>>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >>>>>> [image: favicon.ico]
> >>>>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >>>>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >>>>>>
> >>>>>> Discussion thread:
> >>>>>>
> >>>>>>
> >>>>>> lists.apache.org
> >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy>
> >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy>
> >>>>>> <https://lists.apache.org/thread/xd28mgqy75stgsvp6qybzpljzflkqcsy>
> >>>>>>
> >>>>>>
> >>>>>> --,
> >>>>>> Best,
> >>>>>> Ziming
> >>
> >>
> >>
>
>


Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread David Jacot
Hi Ziming,

Thanks for driving this. I wanted to bring KAFKA-10140
 to your attention. It
looks like the incremental API does not work for configuring plugins. I
think that we need to cover this in the KIP.

Best,
David

On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> +1 (non-binding)
>
> Thanks,
> Andrew
>
> > On 22 Jan 2024, at 07:29, Federico Valeri  wrote:
> >
> > +1 (non binding)
> >
> > Thanks.
> >
> > On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> >>
> >> Hi Ziming,
> >>
> >> +1(binding) from me.
> >>
> >> Thanks.
> >> Luke
> >>
> >> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
> >> kamal.chandraprak...@gmail.com> wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng 
> >>> wrote:
> >>>
>  Hello everyone,
>  I'd like to initiate a vote for KIP-1011.
>  This KIP is about replacing alterConfigs with incrementalAlterConfigs
>  when updating broker configs using kafka-configs.sh, this is similar
> to
>  what we have done in KIP-894.
> 
>  KIP link:
>  KIP-1011: Use incrementalAlterConfigs when updating broker configs by
>  kafka-configs.sh - Apache Kafka - Apache Software Foundation
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  cwiki.apache.org
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  [image: favicon.ico]
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
> 
>  Discussion thread:
> 
> 
>  lists.apache.org
>  
>  
>  
> 
> 
>  --,
>  Best,
>  Ziming
>
>
>


[jira] [Resolved] (KAFKA-16147) Partition is assigned to two members at the same time

2024-01-22 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16147.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Partition is assigned to two members at the same time
> -
>
> Key: KAFKA-16147
> URL: https://issues.apache.org/jira/browse/KAFKA-16147
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Emanuele Sabellico
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
> Attachments: broker1.log, broker2.log, broker3.log, librdkafka.log, 
> server.properties, server1.properties, server2.properties
>
>
> While running [test 0113 of 
> librdkafka|https://github.com/confluentinc/librdkafka/blob/8b6357f872efe2a5a3a2fd2828e4133f85e6b023/tests/0113-cooperative_rebalance.cpp#L2384],
>  subtest _u_multiple_subscription_changes_ have received this error saying 
> that a partition is assigned to two members at the same time.
> {code:java}
> Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] 
> which is already assigned to consumer C_5#consumer-8 {code}
> I've reconstructed this sequence:
> C_5 SUBSCRIBES TO T1
> {noformat}
> %7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id 
> "(null)", current assignment "", subscribe topics 
> "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{noformat}
> C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12
> {noformat}
> [2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7] [GroupId 
> rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw 
> transitioned from CurrentAssignment(memberEpoch=6, previousMemberEpoch=0, 
> targetMemberEpoch=6, state=assigning, assignedPartitions={}, 
> partitionsPendingRevocation={}, 
> partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to 
> CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
> targetMemberEpoch=14, state=stable, 
> assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
> partitionsPendingRevocation={}, partitionsPendingAssignment={}). 
> (org.apache.kafka.coordinator.group.GroupMetadataManager){noformat}
>  
> C_5 RECEIVES TARGET ASSIGNMENT
> {noformat}
> %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat response received target assignment 
> "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat}
>  
> C_5 ACKS TARGET ASSIGNMENT
> {noformat}
> %7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
> "NULL", current assignment 
> "rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", 
> subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
> %7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat response received target assignment 
> "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat}
>  
> C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are 
> pending 
> {noformat}
> %7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
> "NULL", current assignment "NULL", subscribe topics 
> "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
> [2024-01-16 12:10:52,615] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7] [GroupId 
> rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated 
> its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1]. 
> (org.apache.kafka.coordinator.group.GroupMetadataManager)
> [2024-01-16 12:10:52,616] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7]

[jira] [Created] (KAFKA-16168) Implement GroupCoordinator.onPartitionsDeleted

2024-01-19 Thread David Jacot (Jira)
David Jacot created KAFKA-16168:
---

 Summary: Implement GroupCoordinator.onPartitionsDeleted
 Key: KAFKA-16168
 URL: https://issues.apache.org/jira/browse/KAFKA-16168
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16118) Coordinator unloading fails when replica is deleted

2024-01-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16118.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Coordinator unloading fails when replica is deleted
> ---
>
> Key: KAFKA-16118
> URL: https://issues.apache.org/jira/browse/KAFKA-16118
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>
> The new group coordinator always expects the leader epoch to be received when 
> it must unload the metadata for a partition. However, in KRaft, the leader 
> epoch is not passed when the replica is delete (e.g. after reassignment).
> {noformat}
> java.lang.IllegalArgumentException: The leader epoch should always be 
> provided in KRaft.
>     at 
> org.apache.kafka.coordinator.group.GroupCoordinatorService.onResignation(GroupCoordinatorService.java:931)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9$adapted(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$updateCoordinator$4(BrokerMetadataPublisher.scala:397)
>     at java.base/java.lang.Iterable.forEach(Iterable.java:75)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.updateCoordinator(BrokerMetadataPublisher.scala:396)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$7(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.onMetadataUpdate(BrokerMetadataPublisher.scala:186)
>     at 
> org.apache.kafka.image.loader.MetadataLoader.maybePublishMetadata(MetadataLoader.java:382)
>     at 
> org.apache.kafka.image.loader.MetadataBatchLoader.applyDeltaAndUpdate(MetadataBatchLoader.java:286)
>     at 
> org.apache.kafka.image.loader.MetadataBatchLoader.maybeFlushBatches(MetadataBatchLoader.java:222)
>     at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$handleCommit$1(MetadataLoader.java:406)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>     at java.base/java.lang.Thread.run(Thread.java:1583)
>     at 
> org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66){noformat}
> The side effect of this bug is that group coordinator loading/unloading fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16118) Coordinator unloading fails when replica is deleted

2024-01-12 Thread David Jacot (Jira)
David Jacot created KAFKA-16118:
---

 Summary: Coordinator unloading fails when replica is deleted
 Key: KAFKA-16118
 URL: https://issues.apache.org/jira/browse/KAFKA-16118
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot


The new group coordinator always expects the leader epoch to be received when 
it must unload the metadata for a partition. However, in KRaft, the leader 
epoch is not passed when the replica is delete (e.g. after reassignment).
{noformat}
java.lang.IllegalArgumentException: The leader epoch should always be provided 
in KRaft.
    at 
org.apache.kafka.coordinator.group.GroupCoordinatorService.onResignation(GroupCoordinatorService.java:931)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9(BrokerMetadataPublisher.scala:200)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9$adapted(BrokerMetadataPublisher.scala:200)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$updateCoordinator$4(BrokerMetadataPublisher.scala:397)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at 
kafka.server.metadata.BrokerMetadataPublisher.updateCoordinator(BrokerMetadataPublisher.scala:396)
    at 
kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$7(BrokerMetadataPublisher.scala:200)
    at 
kafka.server.metadata.BrokerMetadataPublisher.onMetadataUpdate(BrokerMetadataPublisher.scala:186)
    at 
org.apache.kafka.image.loader.MetadataLoader.maybePublishMetadata(MetadataLoader.java:382)
    at 
org.apache.kafka.image.loader.MetadataBatchLoader.applyDeltaAndUpdate(MetadataBatchLoader.java:286)
    at 
org.apache.kafka.image.loader.MetadataBatchLoader.maybeFlushBatches(MetadataBatchLoader.java:222)
    at 
org.apache.kafka.image.loader.MetadataLoader.lambda$handleCommit$1(MetadataLoader.java:406)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
    at java.base/java.lang.Thread.run(Thread.java:1583)
    at 
org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66){noformat}
The side effect of this bug is that group coordinator loading/unloading fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15982) Move GenericGroup state metrics to `GroupCoordinatorMetricsShard`

2024-01-09 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15982.
-
Resolution: Duplicate

Done in KAFKA-15870.

> Move GenericGroup state metrics to `GroupCoordinatorMetricsShard`
> -
>
> Key: KAFKA-15982
> URL: https://issues.apache.org/jira/browse/KAFKA-15982
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
>
> Currently, the generic group state metrics exist inside 
> `GroupCoordinatorMetrics` as global metrics. This causes issues as during 
> unload, we need to traverse through all groups and decrement the group size 
> counters. 
> Move the generic group state metrics to the shard level so that when a 
> partition is unloaded we automatically remove the counter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15870) Move new group coordinator metrics from Yammer to Metrics

2024-01-09 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15870.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Move new group coordinator metrics from Yammer to Metrics
> -
>
> Key: KAFKA-15870
> URL: https://issues.apache.org/jira/browse/KAFKA-15870
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14519) Add metrics to the new coordinator

2024-01-09 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14519.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Add metrics to the new coordinator
> --
>
> Key: KAFKA-14519
> URL: https://issues.apache.org/jira/browse/KAFKA-14519
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Jeff Kim
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Apache Kafka 3.7.0 Release

2024-01-08 Thread David Jacot
Hi all,

Are you talking about publishing the artefacts to maven central? Looking at
the history [1], it seems that the metadata module has been published since
we have it. I also see other internal modules there too.

[1]
https://central.sonatype.com/artifact/org.apache.kafka/kafka-metadata/versions

Best,
David

Le lun. 8 janv. 2024 à 21:51, Ismael Juma  a écrit :

> Hi Colin,
>
> I think you may have misunderstood what they mean by gradle metadata - it's
> not the Kafka metadata module.
>
> Ismael
>
> On Mon, Jan 8, 2024 at 9:45 PM Colin McCabe  wrote:
>
> > Oops, hit send too soon. I see that #15127 was already merged. So we
> > should no longer be publishing :metadata as part of the clients
> artifacts,
> > right?
> >
> > thanks,
> > Colin
> >
> >
> > On Mon, Jan 8, 2024, at 11:42, Colin McCabe wrote:
> > > Hi Apporv,
> > >
> > > Please remove the metadata module from any artifacts published for
> > > clients. It is only used by the server.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Sun, Jan 7, 2024, at 03:04, Apoorv Mittal wrote:
> > >> Hi Colin,
> > >> Thanks for the response. The only reason for asking the question of
> > >> publishing the metadata is because that's present in previous client
> > >> releases. For more context, the description of PR
> > >>  holds the details and
> > waiting
> > >> for the confirmation there prior to the merge.
> > >>
> > >> Regards,
> > >> Apoorv Mittal
> > >> +44 7721681581
> > >>
> > >>
> > >> On Fri, Jan 5, 2024 at 10:22 PM Colin McCabe 
> > wrote:
> > >>
> > >>> metadata is an internal gradle module. It is not used by clients. So
> I
> > >>> don't see why you would want to publish it (unless I'm
> misunderstanding
> > >>> something).
> > >>>
> > >>> best,
> > >>> Colin
> > >>>
> > >>>
> > >>> On Fri, Jan 5, 2024, at 10:05, Stanislav Kozlovski wrote:
> > >>> > Thanks for reporting the blockers, folks. Good job finding.
> > >>> >
> > >>> > I have one ask - can anybody with Gradle expertise help review this
> > small
> > >>> > PR? https://github.com/apache/kafka/pull/15127 (+1, -1)
> > >>> > In particular, we are wondering whether we need to publish module
> > >>> metadata
> > >>> > as part of the gradle publishing process.
> > >>> >
> > >>> >
> > >>> > On Fri, Jan 5, 2024 at 3:56 PM Proven Provenzano
> > >>> >  wrote:
> > >>> >
> > >>> >> We have potentially one more blocker
> > >>> >> https://issues.apache.org/jira/browse/KAFKA-16082 which might
> > cause a
> > >>> data
> > >>> >> loss scenario with JBOD in KRaft.
> > >>> >> Initial analysis thought this is a problem and further review
> looks
> > >>> like it
> > >>> >> isn't but we are continuing to dig into the issue to ensure that
> it
> > >>> isn't.
> > >>> >> We would request feedback on the bug from anyone who is familiar
> > with
> > >>> this
> > >>> >> code.
> > >>> >>
> > >>> >> --Proven
> > >>> >>
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Best,
> > >>> > Stanislav
> > >>>
> >
>


Re: [ANNOUNCE] New Kafka PMC Member: Divij Vaidya

2023-12-27 Thread David Jacot
Congrats!

Le jeu. 28 déc. 2023 à 05:13, Ismael Juma  a écrit :

> Congratulations Divij!
>
> Ismael
>
> On Wed, Dec 27, 2023 at 3:46 AM Luke Chen  wrote:
>
> > Hi, Everyone,
> >
> > Divij has been a Kafka committer since June, 2023. He has remained very
> > active and instructive in the community since becoming a committer. It's
> my
> > pleasure to announce that Divij is now a member of Kafka PMC.
> >
> > Congratulations Divij!
> >
> > Luke
> > on behalf of Apache Kafka PMC
> >
>


Re: Kafka trunk test & build stability

2023-12-22 Thread David Jacot
I just merged both PRs.

Cheers,
David

Le ven. 22 déc. 2023 à 14:38, David Jacot  a écrit :

> Hey folks,
>
> I believe that my two PRs will fix most of the issues. I have also tweaked
> the configuration of Jenkins to fix the issues relating to cloning the
> repo. There may be other issues but the overall situation should be much
> better when I merge those two.
>
> I will update this thread when I merge them.
>
> Cheers,
> David
>
> Le ven. 22 déc. 2023 à 14:22, Divij Vaidya  a
> écrit :
>
>> Hey folks
>>
>> I think David (dajac) has some fixes lined-up to improve CI such as
>> https://github.com/apache/kafka/pull/15063 and
>> https://github.com/apache/kafka/pull/15062.
>>
>> I have some bandwidth for the next two days to work on fixing the CI. Let
>> me start by taking a look at the list that Sophie shared here.
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Fri, Dec 22, 2023 at 2:05 PM Luke Chen  wrote:
>>
>> > Hi Sophie and Philip and all,
>> >
>> > I share the same pain as you.
>> > I've been waiting for a CI build result in a PR for days.
>> Unfortunately, I
>> > can only get 1 result each day because it takes 8 hours for each run,
>> and
>> > with failed results. :(
>> >
>> > I've looked into the 8 hour timeout build issue and would like to
>> propose
>> > to set a global test timeout as 10 mins using the junit5 feature
>> > <
>> >
>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-declarative-timeouts-default-timeouts
>> > >
>> > .
>> > This way, we can fail those long running tests quickly without impacting
>> > other tests.
>> > PR: https://github.com/apache/kafka/pull/15065
>> > I've tested in my local environment and it works as expected.
>> >
>> > Any feedback is welcome.
>> >
>> > Thanks.
>> > Luke
>> >
>> > On Fri, Dec 22, 2023 at 8:08 AM Philip Nee  wrote:
>> >
>> > > Hey Sophie - I've gotten 2 inflight PRs each with more than 15
>> retries...
>> > > Namely: https://github.com/apache/kafka/pull/15023 and
>> > > https://github.com/apache/kafka/pull/15035
>> > >
>> > > justin filed a flaky test report here though:
>> > > https://issues.apache.org/jira/browse/KAFKA-16045
>> > >
>> > > P
>> > >
>> > > On Thu, Dec 21, 2023 at 3:18 PM Sophie Blee-Goldman <
>> > sop...@responsive.dev
>> > > >
>> > > wrote:
>> > >
>> > > > On a related note, has anyone else had trouble getting even a single
>> > run
>> > > > with no build failures lately? I've had multiple pure-docs PRs
>> blocked
>> > > for
>> > > > days or even weeks because of miscellaneous infra, test, and timeout
>> > > > failures. I know we just had a discussion about whether it's
>> acceptable
>> > > to
>> > > > ever merge with a failing build, and the consensus (which I agree
>> with)
>> > > was
>> > > > NO -- but seriously, this is getting ridiculous. The build might be
>> the
>> > > > worst I've ever seen it, and it just makes it really difficult to
>> > > maintain
>> > > > good will with external contributors.
>> > > >
>> > > > Take for example this small docs PR:
>> > > > https://github.com/apache/kafka/pull/14949
>> > > >
>> > > > It's on its 7th replay, with the first 6 runs all having (at least)
>> one
>> > > > build that failed completely. The issues I saw on this one PR are a
>> > good
>> > > > summary of what I've been seeing elsewhere, so here's the briefing:
>> > > >
>> > > > 1. gradle issue:
>> > > >
>> > > > > * What went wrong:
>> > > > >
>> > > > > Gradle could not start your build.
>> > > > >
>> > > > > > Cannot create service of type BuildSessionActionExecutor using
>> > method
>> > > > >
>> > >
>> LauncherServices$ToolingBuildSessionScopeServices.createActionExecutor()
>> > > > as
>> > > > > there is a problem with parameter #21 of type
>> > > > FileSystemWatchingInformation.
>> > > > >
>> > > > >> Cannot create service of type
>> > BuildLifecycleAwareVirt

Re: Kafka trunk test & build stability

2023-12-22 Thread David Jacot
> > >
> > > >
> > >
> >
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
> > > > >   at
> > > > >
> > > >
> > >
> >
> org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
> > > > >   at
> > > > >
> > > >
> > >
> >
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
> > > > >   at
> > > > >
> > > >
> > >
> >
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
> > > > >   at
> > > > >
> > > >
> > >
> >
> worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
> > > > >   at
> > > > >
> > > >
> > >
> >
> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
> > > > > Caused by: java.net.ConnectException: Connection refused
> > > > >   at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> > > > >   at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)
> > > > >   at
> > > > > java.base/sun.nio.ch
> > > > .SocketChannelImpl.finishTimedConnect(SocketChannelImpl.java:1191)
> > > > >   at
> > > > > java.base/sun.nio.ch
> > > > .SocketChannelImpl.blockingConnect(SocketChannelImpl.java:1233)
> > > > >   at java.base/sun.nio.ch
> > > .SocketAdaptor.connect(SocketAdaptor.java:102)
> > > > >   at
> > > > >
> > > >
> > >
> >
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
> > > > >   at
> > > > >
> > > >
> > >
> >
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
> > > > > ... 5 more
> > > > >
> > > >
> > > >
> > > >
> > > > >  * What went wrong:
> > > >
> > > > Execution failed for task ':core:test'.
> > > >
> > > > > Process 'Gradle Test Executor 104' finished with non-zero exit
> value
> > 1
> > > >
> > > >   This problem might be caused by incorrect test process
> configuration.
> > > >
> > > >
> > > > I've seen almost all of the above issues multiple times, so it might
> > be a
> > > > good list to start with to focus any efforts on improving the build.
> > That
> > > > said, I'm not sure what we can really do about most of these, and not
> > > sure
> > > > how to narrow down the root cause in the more mysterious cases of
> > aborted
> > > > builds and the builds that end with "finished with non-zero exit
> value
> > 1
> > > "
> > > > with no additional context (that I could find)
> > > >
> > > > If nothing else, there seems to be something happening in one (or
> more)
> > > of
> > > > the storage tests, because by far the most common failure I've seen
> is
> > > that
> > > > in 3 & 5. Unfortunately it's not really clear to me how to tell which
> > is
> > > > the offending test, so I'm not even sure what to file a ticket for
> > > >
> > > > On Tue, Dec 19, 2023 at 11:55 PM David Jacot
> >  > > >
> > > > wrote:
> > > >
> > > > > The slowness of the CI is definitely causing us a lot of pain. I
> > wonder
> > > > if
> > > > > we should move to a dedicated CI infrastructure for Kafka. Our
> > > > integration
> > > > > tests are quite heavy and ASF's CI is not really tuned for them. We
> > > could
> > > > > tune it for our needs and this would also allow external companies
> to
> > > > > sponsor more workers. I heard that we have a few cloud providers in
> > > > > the community ;). I think that we should consider this. What do you
> > > > think?
> > > > > I already discussed this with the INFRA team. I could continue if
> we
> > > > > believe that it is a way forward.
> > > > >
> > > > > Best,
> > > 

[jira] [Resolved] (KAFKA-16040) Rename `Generic` to `Classic`

2023-12-21 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16040.
-
Resolution: Fixed

> Rename `Generic` to `Classic`
> -
>
> Key: KAFKA-16040
> URL: https://issues.apache.org/jira/browse/KAFKA-16040
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.7.0
>
>
> People has raised concerned about using {{Generic}} as a name to designate 
> the old rebalance protocol. We considered using {{Legacy}} but discarded it 
> because there are still applications, such as Connect, using the old 
> protocol. We settled on using {{Classic}} for the {{{}Classic Rebalance 
> Protocol{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Road to Kafka 4.0

2023-12-21 Thread David Jacot
Thanks, Ismael. The proposal makes sense. +1

David

On Thu, Dec 21, 2023 at 5:59 PM Ismael Juma  wrote:

> Hi all,
>
> After understanding the use case Josep and Anton described in more detail,
> I think it's fair to say that quorum reconfiguration is necessary for
> migration of Apache Kafka users who follow this pattern. Given that, I
> think we should have a 3.8 release before the 4.0 release.
>
> The next question is whether we should do something special when it comes
> to timeline, parallel releases, etc. After careful consideration, I think
> we should simply follow our usual approach: regular 3.8 release around
> early May 2024 and regular 4.0 release around early September 2024. The
> community will be able to start working on items specific to 4.0 after 3.8
> is branched in late March/early April - I don't think we need to deal with
> the overhead of maintaining multiple long-lived branches for
> feature development.
>
> If the proposal above sounds reasonable, I suggest we write a KIP and vote
> on it. Any volunteers?
>
> Ismael
>
> On Tue, Nov 21, 2023 at 8:18 PM Ismael Juma  wrote:
>
> > Hi Luke,
> >
> > I think we're conflating different things here. There are 3 separate
> > points in your email, but only 1 of them requires 3.8:
> >
> > 1. JBOD may have some bugs in 3.7.0. Whatever bugs exist can be fixed in
> > 3.7.x. We have already said that we will backport critical fixes to 3.7.x
> > for some time.
> > 2. Quorum reconfiguration is important to include in 4.0, the release
> > where ZK won't be supported. This doesn't need a 3.8 release either.
> > 3. Quorum reconfiguration is necessary for migration use cases and hence
> > needs to be in a 3.x release. This one would require a 3.8 release if
> true.
> > But we should have a debate on whether it is indeed true. It's not clear
> to
> > me yet.
> >
> > Ismael
> >
> > On Tue, Nov 21, 2023 at 7:30 PM Luke Chen  wrote:
> >
> >> Hi Colin and Jose,
> >>
> >> I revisited the discussion of KIP-833 here
> >> , and
> >> you
> >> can see I'm the first one to reply to the discussion thread to express
> my
> >> excitement at that time. Till now, I personally still think having KRaft
> >> in
> >> Kafka is a good direction we have to move forward. But to move to this
> >> destination, we need to make our users comfortable with this decision.
> The
> >> worst scenario is, we said 4.0 is ready, and ZK is removed. Then, some
> >> users move to 4.0 and say, wait a minute, why does it not support xxx
> >> feature? And then start to search for other alternatives to replace
> Apache
> >> Kafka. We all don't want to see this, right? So, that's why some
> community
> >> users start to express their concern to move to 4.0 too quickly,
> including
> >> me.
> >>
> >>
> >> Quoting Colin:
> >> > While dynamic quorum reconfiguration is a nice feature, it doesn't
> block
> >> anything: not migration, not deployment.
> >>
> >> Clearly Confluent team might deploy ZooKeeper in a particular way and
> >> didn’t depend on its ability to support reconfiguration. So KRaft is
> ready
> >> from your point of view. But users of Apache Kafka might have come to
> >> depend on some ZooKeeper functionality, such as the ability to
> reconfigure
> >> ZooKeeper quorums, that is not available in KRaft, yet. I don’t think
> the
> >> Apache Kafka documentation has ever said “do not depend on this ability
> of
> >> Apache Kafka or Zookeeper”, so it doesn’t seem unreasonable for users to
> >> have deployed ZooKeeper in this way. In KIP-833
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-MissingFeatures
> >> >,
> >> we said: “Modifying certain dynamic configurations on the standalone
> KRaft
> >> controller” was an important missing feature. Unfortunately it wasn’t as
> >> explicit as it could have been. While no one expects KRaft to support
> all
> >> the features of ZooKeeper, it looks to me that users might depend on
> this
> >> particular feature and it’s only recently that it’s become apparent that
> >> you don’t consider it a blocker.
> >>
> >> Quoting José:
> >> > If we do a 3.8 release before 4.0 and we implement KIP-853 in 3.8, the
> >> user will be able to migrate to a KRaft cluster that supports
> dynamically
> >> changing the set of voters and has better support for disk failures.
> >>
> >> Yes, KIP-853 and disk failure support are both very important missing
> >> features. For the disk failure support, I don't think this is a
> >> "good-to-have-feature", it should be a "must-have" IMO. We can't
> announce
> >> the 4.0 release without a good solution for disk failure in KRaft.
> >>
> >> It’s also worth thinking about how Apache Kafka users who depend on JBOD
> >> might look at the risks of not having a 3.8 release. JBOD support on
> KRaft
> >> is planned to be added in 3.7, and is still in progress so far. So it’s
> >> hard 

Re: [DISCUSS] Road to Kafka 4.0

2023-12-21 Thread David Jacot
Hi Divij,

> Release 4.0 as an "experimental" release

I don't think that this is something that we should do. If we need more
time, we should just do a 3.8 release and then release 4.0 when we are
ready. An experimental major release will be more confusing than anything
else. We should also keep in mind that major releases are also adopted with
more scrutiny in general. I don't think that many users will jump to 4.0
anyway. They will likely wait for 4.0.1 or even 4.1.

Best,
David

On Thu, Dec 21, 2023 at 3:59 PM Divij Vaidya 
wrote:

> Hi folks
>
> I am late to the conversation but I would like to add my point of view
> here.
>
> I have three main concerns:
>
> 1\ Durability/availability bugs in kraft - Even though kraft has been
> around for a while, we keep finding bugs that impact availability and data
> durability in it almost with every release [1] [2]. It's a complex feature
> and such bugs are expected during the stabilization phase. But we can't
> remove the alternative until we see stabilization in kraft i.e. no new
> stability/durability bugs for at least 2 releases.
> 2\ Parity with Zk - There are also pending bugs [3] which are in the
> category of Zk parity. Removing Zk from Kafka without having full feature
> parity with Zk will leave some Kafka users with no upgrade path.
> 3\ Test coverage - We also don't have sufficient test coverage for kraft
> since quite a few tests are Zk only at this stage.
>
> Given these concerns, I believe we need to reach 100% Zk parity and allow
> new feature stabilisation (such as scram, JBOD) for at least 1 version
> (maybe more if we find bugs in that feature) before we remove Zk. I also
> agree with the point of view that we can't delay 4.0 indefinitely and we
> need a clear cut line.
>
> Hence, I propose the following:
> 1\ Keep trunk with 3.x. Release 3.8 and potentially 3.9 if we find major
> (durability/availability related) bugs in 3.8. This will help users
> continue to use their tried and tested Kafka setup until we have a proven
> alternative from feature parity & stability point of view.
> 2\ Release 4.0 as an "experimental" release along with 3.8 "stable"
> release. This will help get user feedback on the feasibility of removing Zk
> completely right now.
> 3\ Create a criteria for moving 4.1 as "stable" release instead of
> "experimental". This list should include 100% Zk parity and 100% Kafka
> tests operating with kraft. It will also include other community feedback
> from this & other threads.
> 4\ When the 4.x version is "stable", move the trunk to 4.x and stop all
> development on the 3.x branch.
>
> I acknowledge that earlier in the community, we have decided to make 3.7 as
> the last release in the 3.x series. But, IMO we have learnt a lot since
> then based on the continuous improvements in kraft. I believe we should be
> flexible with our earlier stance here and allow for greater stability
> before forcing users to a completely new functionality.
>
> [1] https://issues.apache.org/jira/browse/KAFKA-15495
> [2] https://issues.apache.org/jira/browse/KAFKA-15489
> [3] https://issues.apache.org/jira/browse/KAFKA-14874
>
> --
> Divij Vaidya
>
>
>
> On Wed, Dec 20, 2023 at 4:59 PM Josep Prat 
> wrote:
>
> > Hi Justine, Luke, and others,
> >
> > I believe a 3.8 version would make sense, and I would say KIP-853 should
> be
> > part of it as well.
> >
> > Best,
> >
> > On Wed, Dec 20, 2023 at 4:11 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hey Luke,
> > >
> > > I think your point is valid. This is another good reason to have a 3.8
> > > release.
> > > Would you say that implementing KIP-966 in 3.8 would be an acceptable
> way
> > > to move forward?
> > >
> > > Thanks,
> > > Justine
> > >
> > >
> > > On Tue, Dec 19, 2023 at 4:35 AM Luke Chen  wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > I think that for folks that want to prioritize availability over
> > > > durability, the aggressive recovery strategy from KIP-966 should be
> > > > preferable to the old unclean leader election configuration.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas#KIP966:EligibleLeaderReplicas-Uncleanrecovery
> > > >
> > > > Yes, I'm aware that we're going to implement the new way of leader
> > > election
> > > > in KIP-966.
> > > > But obviously, KIP-966 is not included in v3.7.0.
> > > > What I'm worried about is the users who prioritize availability over
> > > > durability and enable the unclean leader election in ZK mode.
> > > > Once they migrate to KRaft, there will be availability impact when
> > > unclean
> > > > leader election is needed.
> > > > And like you said, they can run unclean leader election via CLI, but
> > > again,
> > > > the availability is already impacted, which might be unacceptable in
> > some
> > > > cases.
> > > >
> > > > IMO, we should prioritize this missing feature and include it in 3.x
> > > > release.
> > > > Including in 3.x 

[jira] [Resolved] (KAFKA-15456) Add support for OffsetFetch version 9 in consumer

2023-12-21 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15456.
-
Resolution: Fixed

> Add support for OffsetFetch version 9 in consumer
> -
>
> Key: KAFKA-15456
> URL: https://issues.apache.org/jira/browse/KAFKA-15456
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>    Reporter: David Jacot
>Assignee: Lianet Magrans
>Priority: Major
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16030) new group coordinator should check if partition goes offline during load

2023-12-21 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16030.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> new group coordinator should check if partition goes offline during load
> 
>
> Key: KAFKA-16030
> URL: https://issues.apache.org/jira/browse/KAFKA-16030
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.7.0
>
>
> The new coordinator stops loading if the partition goes offline during load. 
> However, the partition is still considered active. Instead, we should return 
> NOT_LEADER_OR_FOLLOWER exception during load.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16040) Rename `Generic` to `Classic`

2023-12-21 Thread David Jacot (Jira)
David Jacot created KAFKA-16040:
---

 Summary: Rename `Generic` to `Classic`
 Key: KAFKA-16040
 URL: https://issues.apache.org/jira/browse/KAFKA-16040
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.7.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16036) Add `group.coordinator.rebalance.protocols` and publish all new configs

2023-12-21 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16036.
-
Resolution: Fixed

> Add `group.coordinator.rebalance.protocols` and publish all new configs
> ---
>
> Key: KAFKA-16036
> URL: https://issues.apache.org/jira/browse/KAFKA-16036
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Blocker
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16036) Add `group.coordinator.rebalance.protocols` and publish all new configs

2023-12-20 Thread David Jacot (Jira)
David Jacot created KAFKA-16036:
---

 Summary: Add `group.coordinator.rebalance.protocols` and publish 
all new configs
 Key: KAFKA-16036
 URL: https://issues.apache.org/jira/browse/KAFKA-16036
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot
Assignee: David Jacot
 Fix For: 3.7.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2023-12-19 Thread David Jacot
The slowness of the CI is definitely causing us a lot of pain. I wonder if
we should move to a dedicated CI infrastructure for Kafka. Our integration
tests are quite heavy and ASF's CI is not really tuned for them. We could
tune it for our needs and this would also allow external companies to
sponsor more workers. I heard that we have a few cloud providers in
the community ;). I think that we should consider this. What do you think?
I already discussed this with the INFRA team. I could continue if we
believe that it is a way forward.

Best,
David

On Wed, Dec 20, 2023 at 12:17 AM Stanislav Kozlovski
 wrote:

> Hey Николай,
>
> Apologies about this - I wasn't aware of this behavior. I have made all the
> gists public.
>
>
>
> On Wed, Dec 20, 2023 at 12:09 AM Greg Harris  >
> wrote:
>
> > Hey Stan,
> >
> > Thanks for opening the discussion. I haven't been looking at overall
> > build duration recently, so it's good that you are calling it out.
> >
> > I worry about us over-indexing on this one build, which itself appears
> > to be an outlier. I only see one other build [1] above 6h overall in
> > the last 90 days in this view: [2]
> > And I don't see any overlap of failed tests in these two builds, which
> > makes it less likely that these particular failed tests are the causes
> > of long build times.
> >
> > Separately, I've been investigating build environment slowness, and
> > trying to connect it with test failures [3]. I observed that the CI
> > build environment is 2-20 times slower than my developer machine (M1
> > mac).
> > When I simulate a similar slowdown locally, there are tests which
> > become significantly more flakey, often due to hard-coded timeouts.
> > I think that these particularly nasty builds could be explained by
> > long-tail slowdowns causing arbitrary tests to take an excessive time
> > to execute.
> >
> > Rather than trying to find signals in these rare test failures, I
> > think we should find tests that have these sorts of failures more
> > regularly.
> > There are lots of builds in the 5-6h duration bracket, which is
> > certainly unacceptably long. We should look into these builds to find
> > improvements and optimizations.
> >
> > [1] https://ge.apache.org/s/ygh4gbz4uma6i/
> > [2]
> >
> https://ge.apache.org/scans?list.sortColumn=buildDuration=P90D=kafka=trunk=America%2FNew_York
> > [3] https://github.com/apache/kafka/pull/15008
> >
> > Thanks for looking into this!
> > Greg
> >
> > On Tue, Dec 19, 2023 at 3:45 PM Николай Ижиков 
> > wrote:
> > >
> > > Hello, Stanislav.
> > >
> > > Can you, please, make the gist public.
> > > Private gists not available for some GitHub users even if link are
> known.
> > >
> > > > 19 дек. 2023 г., в 17:33, Stanislav Kozlovski <
> stanis...@confluent.io.INVALID>
> > написал(а):
> > > >
> > > > Hey everybody,
> > > > I've heard various complaints that build times in trunk are taking
> too
> > > > long, some taking as much as 8 hours (the timeout) - and this is
> > slowing us
> > > > down from being able to meet the code freeze deadline for 3.7.
> > > >
> > > > I took it upon myself to gather up some data in Gradle Enterprise to
> > see if
> > > > there are any outlier tests that are causing this slowness. Turns out
> > there
> > > > are a few, in this particular build -
> > https://ge.apache.org/s/un2hv7n6j374k/
> > > > - which took 10 hours and 29 minutes in total.
> > > >
> > > > I have compiled the tests that took a disproportionately large amount
> > of
> > > > time (20m+), alongside their time, error message and a link to their
> > full
> > > > log output here -
> > > >
> >
> https://gist.github.com/stanislavkozlovski/8959f7ee59434f774841f4ae2f5228c2
> > > >
> > > > It includes failures from core, streams, storage and clients.
> > > > Interestingly, some other tests that don't fail also take a long time
> > in
> > > > what is apparently the test harness framework. See the gist for more
> > > > information.
> > > >
> > > > I am starting this thread with the intention of getting the
> discussion
> > > > started and brainstorming what we can do to get the build times back
> > under
> > > > control.
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>


[jira] [Resolved] (KAFKA-15971) Re-enable consumer integration tests for new consumer

2023-12-15 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15971.
-
Resolution: Fixed

> Re-enable consumer integration tests for new consumer
> -
>
> Key: KAFKA-15971
> URL: https://issues.apache.org/jira/browse/KAFKA-15971
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 3.7.0
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Major
>  Labels: consumer-threading-refactor, kip-848-preview
> Fix For: 3.7.0
>
>
> Re-enable the consumer integration tests for the new consumer making sure 
> that build stability is not impacted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15237) Implement write operation timeout

2023-12-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15237.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Implement write operation timeout
> -
>
> Key: KAFKA-15237
> URL: https://issues.apache.org/jira/browse/KAFKA-15237
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Sagar Rao
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.7.0
>
>
> In the scala code, we rely on `offsets.commit.timeout.ms` to bound all the 
> writes. We should do the same in the new code. This is important to ensure 
> that the number of pending response in the purgatory is bound. The name of 
> the config is not ideal but we should keep it for backward compatibility 
> reasons.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15981) update Group size only when groups size changes

2023-12-13 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15981.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> update Group size only when groups size changes
> ---
>
> Key: KAFKA-15981
> URL: https://issues.apache.org/jira/browse/KAFKA-15981
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.7.0
>
>
> Currently, we increment generic group metrics whenever we create a new Group 
> object when we load a partition. This is incorrect as the partition may 
> contain several records for the same group if in the active segment or if the 
> segment has not yet been compacted. 
> The same applies to removing groups; we can possibly have multiple group 
> tombstone records. Instead, only increment the metric if we created a new 
> group and only decrement the metric if the group exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15574) Update states and transitions for membership manager state machine

2023-12-11 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15574.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Update states and transitions for membership manager state machine
> --
>
> Key: KAFKA-15574
> URL: https://issues.apache.org/jira/browse/KAFKA-15574
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Kirk True
>Assignee: Lianet Magrans
>Priority: Blocker
>  Labels: kip-848, kip-848-client-support, kip-848-e2e, 
> kip-848-preview
> Fix For: 3.7.0
>
>
> This task is to update the state machine so that it correctly acts as the 
> glue between the heartbeat request manager and the assignment reconciler.
> The state machine will transition from one state to another as a response to 
> heartbeats, callback completion, errors, unsubscribing, and other external 
> events. A given transition may kick off one or more actions that are 
> implemented outside of the state machine.
> Steps:
>  # Update the set of states in the code as [defined in the diagrams on the 
> wiki|https://cwiki.apache.org/confluence/display/KAFKA/Consumer+rebalance#Consumerrebalance-RebalanceStateMachine]
>  # Ensure the correct state transitions are performed as responses to 
> external input
>  # _Define_ any actions that should be taken as a result of the above 
> transitions (commit before revoking partitions, stop fetching from partitions 
> being revoked, allow members that do not join a group)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15978) New consumer sends OffsetCommit with empty member ID

2023-12-10 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15978.
-
Resolution: Fixed

> New consumer sends OffsetCommit with empty member ID
> 
>
> Key: KAFKA-15978
> URL: https://issues.apache.org/jira/browse/KAFKA-15978
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.7.0
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Major
>  Labels: CTR
> Fix For: 3.7.0
>
>
> Running the trogdor tests with the new consumer, it seemed that offsets were 
> not being committed correctly, although the records were being fetched 
> successfully. Upon investigation, it seems that the commit request manager 
> uses a cached member ID which means that its OffsetCommit requests are 
> rejected by the group coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14516) Implement static membeship

2023-12-08 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-14516.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Implement static membeship
> --
>
> Key: KAFKA-14516
> URL: https://issues.apache.org/jira/browse/KAFKA-14516
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>Assignee: Sagar Rao
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15910) New group coordinator needs to generate snapshots while loading

2023-12-06 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15910.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> New group coordinator needs to generate snapshots while loading
> ---
>
> Key: KAFKA-15910
> URL: https://issues.apache.org/jira/browse/KAFKA-15910
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.7.0
>
>
> After the new coordinator loads a __consumer_offsets partition, it logs the 
> following exception when making a read operation (fetch/list groups, etc):
>  
> {{{}java.lang.RuntimeException: No in-memory snapshot for epoch 740745. 
> Snapshot epochs are:{}}}{{{}at 
> org.apache.kafka.timeline.SnapshotRegistry.getSnapshot(SnapshotRegistry.java:178){}}}{{{}at
>  
> org.apache.kafka.timeline.SnapshottableHashTable.snapshottableIterator(SnapshottableHashTable.java:407){}}}{{{}at
>  
> org.apache.kafka.timeline.TimelineHashMap$ValueIterator.(TimelineHashMap.java:283){}}}{{{}at
>  
> org.apache.kafka.timeline.TimelineHashMap$Values.iterator(TimelineHashMap.java:271){}}}
> {{...}}
>  
> This happens because we don't have a snapshot at the last updated high 
> watermark after loading. We cannot generate a snapshot at the high watermark 
> after loading all batches because it may contain records that have not yet 
> been committed. We also don't know where the high watermark will advance up 
> to so we need to generate a snapshot for each offset the loader observes to 
> be greater than the current high watermark. Then once we add the high 
> watermark listener and update the high watermark we can delete all of the 
> snapshots prior. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15705) Add integration tests for Heartbeat API and GroupLeave API

2023-12-05 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15705.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Add integration tests for Heartbeat API and GroupLeave API
> --
>
> Key: KAFKA-15705
> URL: https://issues.apache.org/jira/browse/KAFKA-15705
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15061) CoordinatorPartitionWriter should reuse buffer

2023-12-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15061.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> CoordinatorPartitionWriter should reuse buffer
> --
>
> Key: KAFKA-15061
> URL: https://issues.apache.org/jira/browse/KAFKA-15061
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: David Jacot
>    Assignee: David Jacot
>Priority: Major
>  Labels: kip-848-preview
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-27 Thread David Jacot
Hi all,

I am still experimenting with reducing the noise of flaky tests in build
results. I should have results to share early next week.

Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
it seems risky because the build may be green but with failed tests, no?

I would also like to make it clear that the current rule applies until we
agree on a way forward here. At minimum, I think that a build should be
yellow for all the combinations and the failed tests should have been
triaged to ensure that they are not related to the changes. We should not
merge when a build is red or has not completed.

Best,
David

On Sat, Nov 25, 2023 at 5:25 AM Chris Egerton 
wrote:

> Hi all,
>
> There's a lot to catch up on here but I wanted to clarify something.
> Regarding this comment from Sophie:
>
>
> > Yet multiple people in this thread so
> far have voiced support for "gating merges on the successful completion of
> all parts of the build except tests". Just to be totally clear, I really
> don't think that was ever in question -- though it certainly doesn't hurt
> to remind everyone.
>
> > So, this thread is not about whether or not to merge with failing
> *builds, *but it's
> whether it should be acceptable to merge with failing *tests.*
>
>
> I think there's a misunderstanding here. I was suggesting
> programmatic gating, not manual. If we could disable these types of changes
> from being merged, instead of relying on committers to check and interpret
> Jenkins results, that'd be a quick win IMO. And, because of the
> already-discussed issues with flaky tests, it seemed difficult to disable
> PRs from being merged with failing tests--just for other parts of the
> build.
>
> However, I think the retry logic brought up by David could be sufficient to
> skip that kind of intermediate step and allow us to just start
> programmatically disabling PR merges if the build (including) tests fails.
> But if anyone's interested, we can still prevent failing tests from failing
> the build with the ignoreFailures property [1].
>
> [1] -
>
> https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:ignoreFailures
>
> Cheers,
>
> Chris
>
> On Wed, Nov 22, 2023 at 3:00 AM Ismael Juma  wrote:
>
> > I think it breaks the Jenkins output otherwise. Feel free to test it via
> a
> > PR.
> >
> > Ismael
> >
> > On Wed, Nov 22, 2023, 12:42 AM David Jacot 
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > No, I was not aware of KAFKA-12216. My understanding is that we could
> > still
> > > do it without the JUnitFlakyTestDataPublisher plugin and we could use
> > > gradle enterprise for this. Or do you think that reporting the flaky
> > tests
> > > in the build results is required?
> > >
> > > David
> > >
> > > On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:
> > >
> > > > Hi David,
> > > >
> > > > Did you take a look at
> > https://issues.apache.org/jira/browse/KAFKA-12216
> > > ?
> > > > I
> > > > looked into this option already (yes, there isn't much that we
> haven't
> > > > considered in this space).
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Nov 22, 2023 at 12:24 AM David Jacot
> >  > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks for the good discussion and all the comments. Overall, it
> > seems
> > > > that
> > > > > we all agree on the bad state of our CI. That's a good first step!
> > > > >
> > > > > I have talked to a few folks this week about it and it seems that
> > many
> > > > > folks (including me) are not comfortable with merging PRs at the
> > moment
> > > > > because the results of our builds are so bad. I had 40+ failed
> tests
> > in
> > > > one
> > > > > of my PRs, all unrelated to my changes. It is really hard to be
> > > > productive
> > > > > with this.
> > > > >
> > > > > Personally, I really want to move towards requiring a green build
> to
> > > > merge
> > > > > to trunk because this is a clear and binary signal. I agree that we
> > > need
> > > > to
> > > > > stabilize the builds before we could even require this so here is
> my
> > > > > proposal.
> > > > >
> > > > > 1) We could leverage 

[jira] [Resolved] (KAFKA-15856) Add integration tests for JoinGroup API and SyncGroup API

2023-11-23 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15856.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Add integration tests for JoinGroup API and SyncGroup API
> -
>
> Key: KAFKA-15856
> URL: https://issues.apache.org/jira/browse/KAFKA-15856
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15484) Implement general uniform broker side assignor

2023-11-23 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15484.
-
Fix Version/s: 3.7.0
   Resolution: Fixed

> Implement general uniform broker side assignor
> --
>
> Key: KAFKA-15484
> URL: https://issues.apache.org/jira/browse/KAFKA-15484
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ritika Reddy
>Assignee: Ritika Reddy
>Priority: Major
>  Labels: kip-848, kip-848-preview
> Fix For: 3.7.0
>
>
> Part 2 of the Uniform broker side assignor for consumer groups that have 
> different members subscribed to different topics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15889) Make General Assignor's rebalance incremental

2023-11-23 Thread David Jacot (Jira)
David Jacot created KAFKA-15889:
---

 Summary: Make General Assignor's rebalance incremental
 Key: KAFKA-15889
 URL: https://issues.apache.org/jira/browse/KAFKA-15889
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Jacot


While reviewing [https://github.com/apache/kafka/pull/14481,] we found out that 
it may be possible to make the 
`GeneralUniformAssignmentBuilder.performReassignments` incremental. As it is 
big change, we agreed on looking into this afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread David Jacot
Hi Ismael,

No, I was not aware of KAFKA-12216. My understanding is that we could still
do it without the JUnitFlakyTestDataPublisher plugin and we could use
gradle enterprise for this. Or do you think that reporting the flaky tests
in the build results is required?

David

On Wed, Nov 22, 2023 at 9:35 AM Ismael Juma  wrote:

> Hi David,
>
> Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216?
> I
> looked into this option already (yes, there isn't much that we haven't
> considered in this space).
>
> Ismael
>
> On Wed, Nov 22, 2023 at 12:24 AM David Jacot 
> wrote:
>
> > Hi all,
> >
> > Thanks for the good discussion and all the comments. Overall, it seems
> that
> > we all agree on the bad state of our CI. That's a good first step!
> >
> > I have talked to a few folks this week about it and it seems that many
> > folks (including me) are not comfortable with merging PRs at the moment
> > because the results of our builds are so bad. I had 40+ failed tests in
> one
> > of my PRs, all unrelated to my changes. It is really hard to be
> productive
> > with this.
> >
> > Personally, I really want to move towards requiring a green build to
> merge
> > to trunk because this is a clear and binary signal. I agree that we need
> to
> > stabilize the builds before we could even require this so here is my
> > proposal.
> >
> > 1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
> > From the doc [1]:
> >
> > > When mergeReruns is enabled, if a test fails but is then retried and
> > succeeds, its failures will be recorded as  instead of
> > , within one . This is effectively the reporting
> > produced by the surefire plugin of Apache Maven™ when enabling reruns. If
> > your CI server understands this format, it will indicate that the test
> was
> > flaky. If it > does not, it will indicate that the test succeeded as it
> > will ignore the  information. If the test does not succeed
> > (i.e. it fails for every retry), it will be indicated as having failed
> > whether your tool understands this format or not.
> > > When mergeReruns is disabled (the default), each execution of a test
> will
> > be listed as a separate test case.
> >
> > It would not resolve all the faky tests for sure but it would at least
> > reduce the noise. I see this as a means to get to green builds faster. I
> > played a bit with this setting and I discovered [2]. I was hoping that
> [3]
> > could help to resolve it but I need to confirm.
> >
> > 2) I suppose that we would still have flaky tests preventing us from
> > getting a green build even with the setting in place. For those, I think
> > that we need to review them one by one and decide whether we want to fix
> or
> > disable them. This is a short term effort to help us get to green builds.
> >
> > 3) When we get to a point where we can get green builds consistently, we
> > could enforce it.
> >
> > 4) Flaky tests won't disappear with this. They are just hidden.
> Therefore,
> > we also need a process to review the flaky tests and address them. Here,
> I
> > think that we could leverage the dashboard shared by Ismael. One
> > possibility would be to review it regularly and decide for each test
> > whether it should be fixed, disabled or even removed.
> >
> > Please let me know what you think.
> >
> > Another angle that we could consider is improving the CI infrastructure
> as
> > well. I think that many of those flaky tests are due to overloaded
> Jenkins
> > workers. We should perhaps discuss with the infra team to see whether we
> > could do something there.
> >
> > Best,
> > David
> >
> > [1]
> > https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
> > [2] https://github.com/gradle/gradle/issues/23324
> > [3] https://github.com/apache/kafka/pull/14687
> >
> >
> > On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:
> >
> > > Hi,
> > >
> > > We have a dashboard already:
> > >
> > > [image: image.png]
> > >
> > >
> > >
> >
> https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=FLAKY
> > >
> > > On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков 
> > > wrote:
> > >
> > >> Hello guys.
> > >>
> > >> I want to tell you about one more approach to deal with flaky tests.
> > >> We adopt this approach in Apache Ignite community, so may be it can b

Re: [DISCUSS] Should we continue to merge without a green build? No!

2023-11-22 Thread David Jacot
Hi all,

Thanks for the good discussion and all the comments. Overall, it seems that
we all agree on the bad state of our CI. That's a good first step!

I have talked to a few folks this week about it and it seems that many
folks (including me) are not comfortable with merging PRs at the moment
because the results of our builds are so bad. I had 40+ failed tests in one
of my PRs, all unrelated to my changes. It is really hard to be productive
with this.

Personally, I really want to move towards requiring a green build to merge
to trunk because this is a clear and binary signal. I agree that we need to
stabilize the builds before we could even require this so here is my
proposal.

1) We could leverage the `reports.junitXml.mergeReruns` option in gradle.
>From the doc [1]:

> When mergeReruns is enabled, if a test fails but is then retried and
succeeds, its failures will be recorded as  instead of
, within one . This is effectively the reporting
produced by the surefire plugin of Apache Maven™ when enabling reruns. If
your CI server understands this format, it will indicate that the test was
flaky. If it > does not, it will indicate that the test succeeded as it
will ignore the  information. If the test does not succeed
(i.e. it fails for every retry), it will be indicated as having failed
whether your tool understands this format or not.
> When mergeReruns is disabled (the default), each execution of a test will
be listed as a separate test case.

It would not resolve all the faky tests for sure but it would at least
reduce the noise. I see this as a means to get to green builds faster. I
played a bit with this setting and I discovered [2]. I was hoping that [3]
could help to resolve it but I need to confirm.

2) I suppose that we would still have flaky tests preventing us from
getting a green build even with the setting in place. For those, I think
that we need to review them one by one and decide whether we want to fix or
disable them. This is a short term effort to help us get to green builds.

3) When we get to a point where we can get green builds consistently, we
could enforce it.

4) Flaky tests won't disappear with this. They are just hidden. Therefore,
we also need a process to review the flaky tests and address them. Here, I
think that we could leverage the dashboard shared by Ismael. One
possibility would be to review it regularly and decide for each test
whether it should be fixed, disabled or even removed.

Please let me know what you think.

Another angle that we could consider is improving the CI infrastructure as
well. I think that many of those flaky tests are due to overloaded Jenkins
workers. We should perhaps discuss with the infra team to see whether we
could do something there.

Best,
David

[1] https://docs.gradle.org/current/userguide/java_testing.html#mergereruns
[2] https://github.com/gradle/gradle/issues/23324
[3] https://github.com/apache/kafka/pull/14687


On Wed, Nov 22, 2023 at 4:10 AM Ismael Juma  wrote:

> Hi,
>
> We have a dashboard already:
>
> [image: image.png]
>
>
> https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=FLAKY
>
> On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков 
> wrote:
>
>> Hello guys.
>>
>> I want to tell you about one more approach to deal with flaky tests.
>> We adopt this approach in Apache Ignite community, so may be it can be
>> helpful for Kafka, also.
>>
>> TL;DR: Apache Ignite community have a tool that provide a statistic of
>> tests and can tell if PR introduces new failures.
>>
>> Apache Ignite has a many tests.
>> Latest «Run All» contains around 75k.
>> Most of test has integration style therefore count of flacky are
>> significant.
>>
>> We build a tool - Team City Bot [1]
>> That provides a combined statistic of flaky tests [2]
>>
>> This tool can compare results of Run All for PR and master.
>> If all OK one can comment jira ticket with a visa from bot [3]
>>
>> Visa is a quality proof of PR for Ignite committers.
>> And we can sort out most flaky tests and prioritize fixes with the bot
>> statistic [2]
>>
>> TC bot integrated with the Team City only, for now.
>> But, if Kafka community interested we can try to integrate it with
>> Jenkins.
>>
>> [1] https://github.com/apache/ignite-teamcity-bot
>> [2] https://tcbot2.sbt-ignite-dev.ru/current.html?branch=master=10
>> [3]
>> https://issues.apache.org/jira/browse/IGNITE-19950?focusedCommentId=17767394=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17767394
>>
>>
>>
>> > 15 нояб. 2023 г., в 09:18, Ismael Juma  написал(а):
>> >
>> > To use the pain analogy, people seem to have really good painkillers and
>> > hence they somehow don't feel the pain already. ;)
>> >
>> > The reality is that important and high quality tests will get fixed.
>> Poor
>> > quality tests (low signal to noise ratio) might not get fixed and
>> that's ok.
>> >
>> > I'm not opposed to marking the tests as release blockers as a starting
>> > 

Re: [VOTE] KIP-1001; CurrentControllerId Metric

2023-11-21 Thread David Jacot
+1 from me.

Thanks,
David

On Mon, Nov 20, 2023 at 10:48 PM Jason Gustafson 
wrote:

> The KIP makes sense. +1
>
> On Mon, Nov 20, 2023 at 12:37 PM David Arthur
>  wrote:
>
> > Thanks Colin,
> >
> > +1 from me
> >
> > -David
> >
> > On Tue, Nov 14, 2023 at 3:53 PM Colin McCabe  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to call a vote for KIP-1001: Add CurrentControllerId metric.
> > >
> > > Take a look here:
> > > https://cwiki.apache.org/confluence/x/egyZE
> > >
> > > best,
> > > Colin
> > >
> >
> >
> > --
> > -David
> >
>


[jira] [Resolved] (KAFKA-15849) Fix ListGroups API when runtime partition size is zero

2023-11-17 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15849.
-
Fix Version/s: 3.7.0
 Reviewer: David Jacot
   Resolution: Fixed

> Fix ListGroups API when runtime partition size is zero
> --
>
> Key: KAFKA-15849
> URL: https://issues.apache.org/jira/browse/KAFKA-15849
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Dongnuo Lyu
>Assignee: Dongnuo Lyu
>Priority: Major
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15755) LeaveGroupResponse v0-v2 should handle no members

2023-11-16 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15755.
-
Fix Version/s: 3.4.2
   3.5.2
   3.7.0
   3.6.1
   Resolution: Fixed

> LeaveGroupResponse v0-v2 should handle no members
> -
>
> Key: KAFKA-15755
> URL: https://issues.apache.org/jira/browse/KAFKA-15755
> Project: Kafka
>  Issue Type: Bug
>Reporter: Robert Wagner
>Assignee: Robert Wagner
>Priority: Major
> Fix For: 3.4.2, 3.5.2, 3.7.0, 3.6.1
>
>
> When Sarama and Librdkafka consumer clients issue LeaveGroup requests, they 
> use an older protocol version < 3 which did not include a `members` field.
> Since our upgrade the kafka broker 3.4.1 we have started seeing these broker 
> exceptions:
> {code}
> [2023-10-24 01:17:17,214] ERROR [KafkaApi-28598] Unexpected error handling 
> request RequestHeader(apiKey=LEAVE_GROUP, apiVersion=1, clientId=REDACTED, 
> correlationId=116775, headerVersion=1) -- 
> LeaveGroupRequestData(groupId=REDACTED, 
> memberId='REDACTED-73967453-93c4-4f3f-bcef-32c1f280350f', members=[]) with 
> context RequestContext(header=RequestHeader(apiKey=LEAVE_GROUP, apiVersion=1, 
> clientId=REDACTED, correlationId=116775, headerVersion=1), 
> connectionId='REDACTED', clientAddress=/REDACTED, principal=REDACTED, 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> clientInformation=ClientInformation(softwareName=confluent-kafka-python, 
> softwareVersion=1.7.0-rdkafka-1.7.0), fromPrivilegedListener=false, 
> principalSerde=Optional[REDACTED]) (kafka.server.KafkaApis)
> java.util.concurrent.CompletionException: 
> org.apache.kafka.common.errors.UnsupportedVersionException: LeaveGroup 
> response version 1 can only contain one member, got 0 members.
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
>   at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:936)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:950)
>   at 
> java.base/java.util.concurrent.CompletableFuture.handle(CompletableFuture.java:2340)
>   at kafka.server.KafkaApis.handleLeaveGroupRequest(KafkaApis.scala:1796)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:196)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
>   at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: 
> LeaveGroup response version 1 can only contain one member, got 0 members. 
> {code}
>  
> KIP-848 introduced a check in LeaveGroupResponse that the members field must 
> have 1 element.  In some error cases, it seems like the members field has 0 
> elements - which would still be a valid response for v0-v2 messages, but this 
> exception was being thrown.
> Instead of throwing an exception in this case, continue with the 
> LeaveGroupResponse, since it is not a field included in v0 - v2 responses 
> anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Should we continue to merge without a green build? No!

2023-11-11 Thread David Jacot
Hi all,

The state of our CI worries me a lot. Just this week, we merged two PRs
with compilation errors and one PR introducing persistent failures. This
really hurts the quality and the velocity of the project and it basically
defeats the purpose of having a CI because we tend to ignore it nowadays.

Should we continue to merge without a green build? No! We should not so I
propose to prevent merging a pull request without a green build. This is a
really simple and bold move that will prevent us from introducing
regressions and will improve the overall health of the project. At the same
time, I think that we should disable all the known flaky tests, raise jiras
for them, find an owner for each of them, and fix them.

What do you think?

Best,
David


  1   2   3   4   5   6   7   8   9   >