Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2596

2024-01-22 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 457952 lines...]
Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testEmptyWrite() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testEmptyWrite() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testReadMigrateAndWriteProducerId() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testReadMigrateAndWriteProducerId() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testExistingKRaftControllerClaim() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testExistingKRaftControllerClaim() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testMigrateTopicConfigs() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testMigrateTopicConfigs() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testNonIncreasingKRaftEpoch() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testNonIncreasingKRaftEpoch() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testMigrateEmptyZk() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testMigrateEmptyZk() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testTopicAndBrokerConfigsMigrationWithSnapshots() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testTopicAndBrokerConfigsMigrationWithSnapshots() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testClaimAndReleaseExistingController() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testClaimAndReleaseExistingController() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testClaimAbsentController() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testClaimAbsentController() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testIdempotentCreateTopics() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testIdempotentCreateTopics() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testCreateNewTopic() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testCreateNewTopic() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testUpdateExistingTopicWithNewAndChangedPartitions() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZkMigrationClientTest > 
testUpdateExistingTopicWithNewAndChangedPartitions() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZNodeChangeHandlerForDataChange() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZooKeeperSessionStateMetric() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZooKeeperSessionStateMetric() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testExceptionInBeforeInitializingSession() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testGetChildrenExistingZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testConnection() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testConnection() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testZNodeChangeHandlerForCreation() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testGetAclExistingZNode() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testGetAclExistingZNode() PASSED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testSessionExpiryDuringClose() STARTED

Gradle Test Run :core:test > Gradle Test Executor 97 > ZooKeeperClientTest > 
testSessionExpiryDuringClose() PASSED

Gradle Test Run :core:test > 

Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread David Jacot
Hi Chris, Ziming,

Thanks for the clarification. I am glad that it does not impact the tool.
It may be worth adding a note about it in the KIP to avoid the same
question in the future.

Otherwise, I am +1 (binding). Thanks for driving this!

Best,
David

On Tue, Jan 23, 2024 at 6:07 AM ziming deng 
wrote:

> Hello David,
>
> Thanks for reminding this, as Chirs explained, the tools I’m trying to
> update only support set/delete configs, and I’m just make a way for
> append/subtract configs in the future, so this would not be affected by
> KAFKA-10140, and it would be a little overkill to support append/subtract
> configs or solve KAFKA-10140 here, so let’s leave it right now, I'm happy
> to pick it after finishing this KIP.
>
> --,
> Ziming
>
> > On Jan 22, 2024, at 18:23, David Jacot 
> wrote:
> >
> > Hi Ziming,
> >
> > Thanks for driving this. I wanted to bring KAFKA-10140
> >  to your attention.
> It
> > looks like the incremental API does not work for configuring plugins. I
> > think that we need to cover this in the KIP.
> >
> > Best,
> > David
> >
> > On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
> > andrew_schofield_j...@outlook.com> wrote:
> >
> >> +1 (non-binding)
> >>
> >> Thanks,
> >> Andrew
> >>
> >>> On 22 Jan 2024, at 07:29, Federico Valeri 
> wrote:
> >>>
> >>> +1 (non binding)
> >>>
> >>> Thanks.
> >>>
> >>> On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> 
>  Hi Ziming,
> 
>  +1(binding) from me.
> 
>  Thanks.
>  Luke
> 
>  On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
>  kamal.chandraprak...@gmail.com> wrote:
> 
> > +1 (non-binding)
> >
> > On Mon, Jan 22, 2024 at 8:34 AM ziming deng <
> dengziming1...@gmail.com>
> > wrote:
> >
> >> Hello everyone,
> >> I'd like to initiate a vote for KIP-1011.
> >> This KIP is about replacing alterConfigs with
> incrementalAlterConfigs
> >> when updating broker configs using kafka-configs.sh, this is similar
> >> to
> >> what we have done in KIP-894.
> >>
> >> KIP link:
> >> KIP-1011: Use incrementalAlterConfigs when updating broker configs
> by
> >> kafka-configs.sh - Apache Kafka - Apache Software Foundation
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >> cwiki.apache.org
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >> [image: favicon.ico]
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >>>
> >>
> >> Discussion thread:
> >>
> >>
> >> lists.apache.org
> >> 
> >> 
> >> 
> >>
> >>
> >> --,
> >> Best,
> >> Ziming
> >>
> >>
> >>
>
>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #72

2024-01-22 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread ziming deng
Hello David,

Thanks for reminding this, as Chirs explained, the tools I’m trying to update 
only support set/delete configs, and I’m just make a way for append/subtract 
configs in the future, so this would not be affected by KAFKA-10140, and it 
would be a little overkill to support append/subtract configs or solve 
KAFKA-10140 here, so let’s leave it right now, I'm happy to pick it after 
finishing this KIP.

--,
Ziming

> On Jan 22, 2024, at 18:23, David Jacot  wrote:
> 
> Hi Ziming,
> 
> Thanks for driving this. I wanted to bring KAFKA-10140
>  to your attention. It
> looks like the incremental API does not work for configuring plugins. I
> think that we need to cover this in the KIP.
> 
> Best,
> David
> 
> On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Andrew
>> 
>>> On 22 Jan 2024, at 07:29, Federico Valeri  wrote:
>>> 
>>> +1 (non binding)
>>> 
>>> Thanks.
>>> 
>>> On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
 
 Hi Ziming,
 
 +1(binding) from me.
 
 Thanks.
 Luke
 
 On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
 kamal.chandraprak...@gmail.com> wrote:
 
> +1 (non-binding)
> 
> On Mon, Jan 22, 2024 at 8:34 AM ziming deng 
> wrote:
> 
>> Hello everyone,
>> I'd like to initiate a vote for KIP-1011.
>> This KIP is about replacing alterConfigs with incrementalAlterConfigs
>> when updating broker configs using kafka-configs.sh, this is similar
>> to
>> what we have done in KIP-894.
>> 
>> KIP link:
>> KIP-1011: Use incrementalAlterConfigs when updating broker configs by
>> kafka-configs.sh - Apache Kafka - Apache Software Foundation
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
>>> 
>> cwiki.apache.org
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
>>> 
>> [image: favicon.ico]
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
>>> 
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
>>> 
>> 
>> Discussion thread:
>> 
>> 
>> lists.apache.org
>> 
>> 
>> 
>> 
>> 
>> --,
>> Best,
>> Ziming
>> 
>> 
>> 



[jira] [Resolved] (KAFKA-16144) Controller leader checkQuorum timer should skip only 1 controller case

2024-01-22 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16144.
---
Resolution: Fixed

> Controller leader checkQuorum timer should skip only 1 controller case
> --
>
> Key: KAFKA-16144
> URL: https://issues.apache.org/jira/browse/KAFKA-16144
> Project: Kafka
>  Issue Type: Bug
>Reporter: Luke Chen
>Assignee: Luke Chen
>Priority: Blocker
>  Labels: newbie, newbie++
> Fix For: 3.7.0
>
>
> In KAFKA-15489, we fixed the potential "split brain" issue by adding the 
> check quorum timer. This timer will be updated when the follower fetch 
> request arrived. And it expires the timer when the there are no majority of 
> voter followers fetch from leader, and resign the leadership. 
> But in KAFKA-15489, we forgot to consider the case where there's only 1 
> controller node. If there's only 1 controller node (and no broker node), 
> there will be no fetch request arrived, so the timer will expire each time. 
> However, if there's only 1 node, we don't have to care about the "check 
> quorum" at all. We should skip the check for only 1 controller node case.
> Currently, this issue will happen only when there's 1 controller node and no 
> any broker node (i.e. no fetch request sent to the controller). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

2024-01-22 Thread José Armando García Sancio
Thanks Jason. Comments below.

On Wed, Jan 10, 2024 at 9:06 AM Jason Gustafson
 wrote:
> One additional thought. It would be helpful to have an example to justify
> the need for this:
>
> > Wait for the fetch offset of the replica (ID, UUID) to catch up to the
> log end offset of the leader.
>
> It is helpful also to explain how this affects the AddVoter RPC. Do we wait
> indefinitely? Or do we give up and return a timeout error if the new voter
> cannot catch up? Probably the latter makes the most sense.

Yes. I will have more details here. Jason and I discussed this offline
but waiting for the new replica to catch (to the LEO) is a heuristic
that would minimize the amount of time where the leader cannot
increase the HWM because the new replica is needed to form the
majority. A example that shows this is:

Current Voter Set:
A: offset = 100
B: offset = 100
C: offset = 0

In this configuration the leader can continue to advance the HWM since
the majority A, B is at the HWM/LEO.

If the user now adds a voter to the voter set:
A: offset = 100
B: offset = 100
C: offset = 0
D: offset = 0

The leader cannot advance the HWM until either C or D catches up to
the HWM because the majority has to include one of either C, D or
both.

Thanks,
-- 
-José


[jira] [Resolved] (KAFKA-16166) Generify RetryWithToleranceOperator and ErrorReporter

2024-01-22 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-16166.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Generify RetryWithToleranceOperator and ErrorReporter
> -
>
> Key: KAFKA-16166
> URL: https://issues.apache.org/jira/browse/KAFKA-16166
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.8.0
>
>
> The RetryWithToleranceOperator and ErrorReporter instances in connect are 
> only ever used with a single type of ProcessingContext 
> (ProcessingContext for sources, 
> ProcessingContext> for sinks) and currently 
> dynamically decide between these with instanceof checks.
> Instead, these classes should be generic, and have their implementations 
> accept consistent ProcessingContext objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

2024-01-22 Thread José Armando García Sancio
Thanks for your feedback Jason and excuse the delayed response.

See comments below.

On Tue, Jan 9, 2024 at 5:08 PM Jason Gustafson
 wrote:
>
> Hi Jose,
>
> Thanks for the KIP! A few initial questions below:
>
> 1. In the user experience section, the user is expected to provide supply
> the UUID for each voter. I'm assuming this is the directory.id coming from
> KIP-858. I thought it was generated by the format comand automatically? It
> seems like we will need a way to specify it explicitly.

Yes. It is currently using a random uuid. I was planning to update
that code to instead use the uuid provided in the
--controller-quorum-voters flag. For example, if the node.id in the
--config file matches one of the replica-id in
--controller-quorum-voters, use the specified replica-uuid for the
directory.id of the metadata.log.dir.

> 2. Do we need the AddVoters and RemoveVoters control records? Perhaps the
> VotersRecord is sufficient since changes to the voter set will be rare.

Yes. We can remove them and I'll remove them. For context, I
originally only had AddVoterRecord and RemoveVoterRecord. When
fleshing out the details of the bootstrapping process, I realized that
I needed a control record that would override the voter set and not do
an incremental change. I needed this functionality because there is no
guarantee that the content of the bootstrap checkpoint
(...-....checkpoint) matches in all of the voters.

After I added the VotersRecord, my thinking for keeping AddVoterRecord
and RemoveVoterRecord was to make it explicitly that the protocol only
allows changing one voter at a time. I can instead write a comparison
function that KRaft can use whenever it attempts to write or read a
VotersRecord. The comparison function would fail if all of the
possible majorities of the old voter set don't intersect with all of
the possible majority of the new voter set.

What do you think?

> 3. Why does UpdateVoter need to commit after every leader change?

My thinking is that this algorithm is easier to implement.
Technically, the following (fetching) voter only needs to send an
UpdateVoter RPC when the endpoints known by the leader don't match the
latest endpoint for the voter. This is not something that the follower
can know reliably. This is why I prefered to add an idempotent RPC
like UpdateVoter RPC that the follower voter can perform aggressively
against the leader when the voter discovers a leader.

> 4. Should ReplicaUuid in FetchRequest be a tagged field? It seems like a
> lot of overhead for all consumer fetches.

Yes. I'll make it a tagged field. For now it will only be used by
KRaft. In the future, I can see the broker wanting to use this field
to implement JBOD "reassignment". I don't think consumers will ever
use this field.

> 5. I was looking for the correctness conditions when changing the voter
> set. I recalled something about always taking the voter set from the log
> even if it is not yet known to be committed. Do you have those details
> documented?

Yes. That's correct.

The section Reference Explanation / Voter Changes / Adding Voters has:
"If the new leader supports the current kraft.version, it will write a
AddVoterRecord to the log and immediately update its in-memory quorum
state to include this voter as part of the quorum."

The section Public Interface / Log and Snapshot control records /
AddVoterRecord has:
"KRaft replicas will read all of the control records in the snapshot
and the log irrespective of the commit state and HWM. When a replica
encounters an AddVoterRecord it will add the replica ID and UUID to
its voter set."

The section Public Interface / RPCs / AddVoter / Handling has:
"6. Append the AddVoterRecord to the log.
7. The KRaft internal listener will read this record from the log and
add the voter to the voter set.
8. Wait for the AddVoterRecord to commit using the majority of new voter set."

Thanks,
-- 
-José


Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

2024-01-22 Thread José Armando García Sancio
Thanks for the feedback Colin. Comments below.

On Tue, Jan 9, 2024 at 4:58 PM Colin McCabe  wrote:
> 1. restarting a controller with an empty storage directory
>
> The controller can contact the quorum to get the cluster ID and current MV. 
> If the MV doesn't support quorum reconfiguration, it can bail out. Otherwise, 
> it can re-format itself with a random directory ID. It can then remove (ID, 
> OLD_DIR_ID) from the quorum, and add (ID, NEW_DIR_ID) to the quorum.
>
> I think this can all be done automatically without user intervention. If the 
> remove / add steps fail (because the quorum is down, for example), then of 
> course we can just log an exception and bail out.
>

Yes. We should be able to implement this. I'll update the KIP and add
another configuration: controller.quorum.auto.join.enabled=true

The high-level algorithm is something like this:
1. The controllers will fetch the latest quorum state from the Leader
2. The controller will remove any voter that matches its replica id
but doesn't match its directory id (replica uuid).
3. If the controller (replica id and replica uuid) is not in the voter
set it sends a AddVoter RPC to the controller until it sees itself in
the voter set.

> 2. restarting a broker with an empty storage directory
>
> The broker can contact the quorum to get the cluster ID and current MV. If 
> the MV doesn't support directory IDs, we can bail out. Otherwise, it can 
> reformat itself with a random directory ID and start up. Its old replicas 
> will be correctly treated as gone due to the JBOD logic.

This feature seems reasonable to me. I don't think we should make this
part of this KIP. It should be a seperate KIP as it is not related to
controller dynamic membership changes.

> 4. Bringing up a totally new cluster
>
> I think we need at least one controller node to be formatted, so that we can 
> decide what metadata version to use. Perhaps we should even require a quorum 
> of controller nodes to be explicitly formatted (aka, in practice, people just 
> format them all).

Yes. When I document this feature my recommended process would be:
1. One of the controllers needs to be formatted in --standalone
(kafka-storage format --cluster-id  --release-version 3.8
--standalone --config controller.properties). This needs to be an
explicit operation as it violates one of the invariants enumerated in
the KIP.
"To make changes to the voter set safe it is required that the
majority of the competing voter sets commit the voter changes. In this
design the competing voter sets are the current voter set and new
voter set. Since this design only allows one voter change at a time
the majority of the new configuration always overlaps (intercepts) the
majority of the old configuration. This is done by the leader
committing the current epoch when it becomes leader and committing
single voter changes with the new voter set before accepting another
voter change."

An easy example that shows the issue with auto formatting is:
1. Voter set is (1) by running --standalone.
2. Voter set to (1, 2, 3) after two AddVoter RPCs (either manually but
by auto joining).
3. Voter 1 loses its disk reformats back to (1). Now it is possible to
have two quorums one with just the replica 1 and one with the replicas
2 and 3.


Thanks. I'll update the KIP shortly to reflect my comments above,
--
-José


Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-22 Thread Justine Olshan
Thanks Jun,

I will update the KIP with the prev field for prepare as well.

PREPARE
producerId: x
previous/lastProducerId (tagged field): x
nextProducerId (tagged field): empty or z if y will overflow
producerEpoch: y + 1

COMPLETE
producerId: x or z if y overflowed
previous/lastProducerId (tagged field): x
nextProducerId (tagged field): empty
producerEpoch: y + 1 or 0 if we overflowed

Thanks again,
Justine

On Mon, Jan 22, 2024 at 3:15 PM Jun Rao  wrote:

> Hi, Justine,
>
> 101.3 Thanks for the explanation.
> (1) My point was that the coordinator could fail right after writing the
> prepare marker. When the new txn coordinator generates the complete marker
> after the failover, it needs some field from the prepare marker to
> determine whether it's written by the new client.
>
> (2) The changing of the behavior sounds good to me. We only want to return
> success if the prepare state is written by the new client. So, in the
> non-overflow case, it seems that we also need sth in the prepare marker to
> tell us whether it's written by the new client.
>
> 112. Thanks for the explanation. That sounds good to me.
>
> Jun
>
> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
>  wrote:
>
> > 101.3 I realized that I actually have two questions.
> > > (1) In the non-overflow case, we need to write the previous produce Id
> > tagged field in the end maker so that we know if the marker is from the
> new
> > client. Since the end maker is derived from the prepare marker, should we
> > write the previous produce Id in the prepare marker field too? Otherwise,
> > we will lose this information when deriving the end marker.
> >
> > The "previous" producer ID is in the normal producer ID field. So yes, we
> > need it in prepare and that was always the plan.
> >
> > Maybe it is a bit unclear so I will enumerate the fields and add them to
> > the KIP if that helps.
> > Say we have producer ID x and epoch y. When we overflow epoch y we get
> > producer ID Z.
> >
> > PREPARE
> > producerId: x
> > previous/lastProducerId (tagged field): empty
> > nextProducerId (tagged field): empty or z if y will overflow
> > producerEpoch: y + 1
> >
> > COMPLETE
> > producerId: x or z if y overflowed
> > previous/lastProducerId (tagged field): x
> > nextProducerId (tagged field): empty
> > producerEpoch: y + 1 or 0 if we overflowed
> >
> > (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> > fields and are issuing the same command (ie commit not abort), we return
> > success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> > in this case. Are we intentionally making this change?
> >
> > Hmm -- we would fence the producer if the epoch is bumped and we get a
> > lower epoch. Yes -- we are intentionally adding this to prevent fencing.
> >
> >
> > 112. We already merged the code that adds the VerifyOnly field in
> > AddPartitionsToTxnRequest, which is an inter broker request. It seems
> that
> > we didn't bump up the IBP for that. Do you know why?
> >
> > We no longer need IBP for all interbroker requests as ApiVersions should
> > correctly gate versioning.
> > We also handle unsupported version errors correctly if we receive them in
> > edge cases like upgrades/downgrades.
> >
> > Justine
> >
> > On Mon, Jan 22, 2024 at 11:00 AM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply.
> > >
> > > 101.3 I realized that I actually have two questions.
> > > (1) In the non-overflow case, we need to write the previous produce Id
> > > tagged field in the end maker so that we know if the marker is from the
> > new
> > > client. Since the end maker is derived from the prepare marker, should
> we
> > > write the previous produce Id in the prepare marker field too?
> Otherwise,
> > > we will lose this information when deriving the end marker.
> > > (2) In the prepare phase, if we retry and see epoch - 1 + ID in last
> seen
> > > fields and are issuing the same command (ie commit not abort), we
> return
> > > success. The logic before KIP-890 seems to return
> CONCURRENT_TRANSACTIONS
> > > in this case. Are we intentionally making this change?
> > >
> > > 112. We already merged the code that adds the VerifyOnly field in
> > > AddPartitionsToTxnRequest, which is an inter broker request. It seems
> > that
> > > we didn't bump up the IBP for that. Do you know why?
> > >
> > > Jun
> > >
> > > On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > 101.3 I can change "last seen" to "current producer id and epoch" if
> > that
> > > > was the part that was confusing
> > > > 110 I can mention this
> > > > 111 I can do that
> > > > 112 We still need it. But I am still finalizing the design. I will
> > update
> > > > the KIP once I get the information finalized. Sorry for the delays.
> > > >
> > > > Justine
> > > >
> > > > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Justine,
> > > > >
> > > > > Thanks for the reply.
> > > 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-22 Thread Jun Rao
Hi, Justine,

101.3 Thanks for the explanation.
(1) My point was that the coordinator could fail right after writing the
prepare marker. When the new txn coordinator generates the complete marker
after the failover, it needs some field from the prepare marker to
determine whether it's written by the new client.

(2) The changing of the behavior sounds good to me. We only want to return
success if the prepare state is written by the new client. So, in the
non-overflow case, it seems that we also need sth in the prepare marker to
tell us whether it's written by the new client.

112. Thanks for the explanation. That sounds good to me.

Jun

On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
 wrote:

> 101.3 I realized that I actually have two questions.
> > (1) In the non-overflow case, we need to write the previous produce Id
> tagged field in the end maker so that we know if the marker is from the new
> client. Since the end maker is derived from the prepare marker, should we
> write the previous produce Id in the prepare marker field too? Otherwise,
> we will lose this information when deriving the end marker.
>
> The "previous" producer ID is in the normal producer ID field. So yes, we
> need it in prepare and that was always the plan.
>
> Maybe it is a bit unclear so I will enumerate the fields and add them to
> the KIP if that helps.
> Say we have producer ID x and epoch y. When we overflow epoch y we get
> producer ID Z.
>
> PREPARE
> producerId: x
> previous/lastProducerId (tagged field): empty
> nextProducerId (tagged field): empty or z if y will overflow
> producerEpoch: y + 1
>
> COMPLETE
> producerId: x or z if y overflowed
> previous/lastProducerId (tagged field): x
> nextProducerId (tagged field): empty
> producerEpoch: y + 1 or 0 if we overflowed
>
> (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> fields and are issuing the same command (ie commit not abort), we return
> success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> in this case. Are we intentionally making this change?
>
> Hmm -- we would fence the producer if the epoch is bumped and we get a
> lower epoch. Yes -- we are intentionally adding this to prevent fencing.
>
>
> 112. We already merged the code that adds the VerifyOnly field in
> AddPartitionsToTxnRequest, which is an inter broker request. It seems that
> we didn't bump up the IBP for that. Do you know why?
>
> We no longer need IBP for all interbroker requests as ApiVersions should
> correctly gate versioning.
> We also handle unsupported version errors correctly if we receive them in
> edge cases like upgrades/downgrades.
>
> Justine
>
> On Mon, Jan 22, 2024 at 11:00 AM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > 101.3 I realized that I actually have two questions.
> > (1) In the non-overflow case, we need to write the previous produce Id
> > tagged field in the end maker so that we know if the marker is from the
> new
> > client. Since the end maker is derived from the prepare marker, should we
> > write the previous produce Id in the prepare marker field too? Otherwise,
> > we will lose this information when deriving the end marker.
> > (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> > fields and are issuing the same command (ie commit not abort), we return
> > success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> > in this case. Are we intentionally making this change?
> >
> > 112. We already merged the code that adds the VerifyOnly field in
> > AddPartitionsToTxnRequest, which is an inter broker request. It seems
> that
> > we didn't bump up the IBP for that. Do you know why?
> >
> > Jun
> >
> > On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hi Jun,
> > >
> > > 101.3 I can change "last seen" to "current producer id and epoch" if
> that
> > > was the part that was confusing
> > > 110 I can mention this
> > > 111 I can do that
> > > 112 We still need it. But I am still finalizing the design. I will
> update
> > > the KIP once I get the information finalized. Sorry for the delays.
> > >
> > > Justine
> > >
> > > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 101.3 In the non-overflow case, the previous ID is the same as the
> > > produce
> > > > ID for the complete marker too, but we set the previous ID in the
> > > complete
> > > > marker. Earlier you mentioned that this is to know that the marker is
> > > > written by the new client so that we could return success on retried
> > > > endMarker requests. I was trying to understand why this is not needed
> > for
> > > > the prepare marker since retry can happen in the prepare state too.
> Is
> > > the
> > > > reason that in the prepare state, we return CONCURRENT_TRANSACTIONS
> > > instead
> > > > of success on retried endMaker requests? If so, should we change "If
> we
> > > > retry and 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-22 Thread Artem Livshits
>  Hmm -- we would fence the producer if the epoch is bumped and we get a
lower epoch. Yes -- we are intentionally adding this to prevent fencing.

I think Jun's point is that we can defer the fencing decision until
transition into complete state (which I believe is what the current logic
is doing) -- just return CONCURRENT_TRANSACTIONS without checking the epoch
while in the prepare state.

That said, we do need to remember the next producer id somewhere in the
prepare state, because in the complete state we would need to make a
fencing decision and let the old producer in if the request is the retry
commit / abort operation.

An alternative could be to not reply to the client until complete state is
written, then we don't have to generate a new producer id during prepare
state.  But that would affect pipelining opportunities and probably require
a separate KIP to discuss the pros and cons.

-Artem

On Mon, Jan 22, 2024 at 11:34 AM Justine Olshan
 wrote:

> 101.3 I realized that I actually have two questions.
> > (1) In the non-overflow case, we need to write the previous produce Id
> tagged field in the end maker so that we know if the marker is from the new
> client. Since the end maker is derived from the prepare marker, should we
> write the previous produce Id in the prepare marker field too? Otherwise,
> we will lose this information when deriving the end marker.
>
> The "previous" producer ID is in the normal producer ID field. So yes, we
> need it in prepare and that was always the plan.
>
> Maybe it is a bit unclear so I will enumerate the fields and add them to
> the KIP if that helps.
> Say we have producer ID x and epoch y. When we overflow epoch y we get
> producer ID Z.
>
> PREPARE
> producerId: x
> previous/lastProducerId (tagged field): empty
> nextProducerId (tagged field): empty or z if y will overflow
> producerEpoch: y + 1
>
> COMPLETE
> producerId: x or z if y overflowed
> previous/lastProducerId (tagged field): x
> nextProducerId (tagged field): empty
> producerEpoch: y + 1 or 0 if we overflowed
>
> (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> fields and are issuing the same command (ie commit not abort), we return
> success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> in this case. Are we intentionally making this change?
>
> Hmm -- we would fence the producer if the epoch is bumped and we get a
> lower epoch. Yes -- we are intentionally adding this to prevent fencing.
>
>
> 112. We already merged the code that adds the VerifyOnly field in
> AddPartitionsToTxnRequest, which is an inter broker request. It seems that
> we didn't bump up the IBP for that. Do you know why?
>
> We no longer need IBP for all interbroker requests as ApiVersions should
> correctly gate versioning.
> We also handle unsupported version errors correctly if we receive them in
> edge cases like upgrades/downgrades.
>
> Justine
>
> On Mon, Jan 22, 2024 at 11:00 AM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > 101.3 I realized that I actually have two questions.
> > (1) In the non-overflow case, we need to write the previous produce Id
> > tagged field in the end maker so that we know if the marker is from the
> new
> > client. Since the end maker is derived from the prepare marker, should we
> > write the previous produce Id in the prepare marker field too? Otherwise,
> > we will lose this information when deriving the end marker.
> > (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> > fields and are issuing the same command (ie commit not abort), we return
> > success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> > in this case. Are we intentionally making this change?
> >
> > 112. We already merged the code that adds the VerifyOnly field in
> > AddPartitionsToTxnRequest, which is an inter broker request. It seems
> that
> > we didn't bump up the IBP for that. Do you know why?
> >
> > Jun
> >
> > On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hi Jun,
> > >
> > > 101.3 I can change "last seen" to "current producer id and epoch" if
> that
> > > was the part that was confusing
> > > 110 I can mention this
> > > 111 I can do that
> > > 112 We still need it. But I am still finalizing the design. I will
> update
> > > the KIP once I get the information finalized. Sorry for the delays.
> > >
> > > Justine
> > >
> > > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 101.3 In the non-overflow case, the previous ID is the same as the
> > > produce
> > > > ID for the complete marker too, but we set the previous ID in the
> > > complete
> > > > marker. Earlier you mentioned that this is to know that the marker is
> > > > written by the new client so that we could return success on retried
> > > > endMarker requests. I was trying to understand why this is not needed
> > for
> > > > the 

Re: [DISCUSS] KIP-1016 Make MM2 heartbeats topic name configurable

2024-01-22 Thread Chris Egerton
Hi Berci,

Thanks for the KIP!

IMO we don't need the "default." prefix for the new property, and it
deviates a bit from the precedent set by properties like
"replication.policy.internal.topic.separator.enabled". I think we can just
call it "replication.policy.heartbeats.topic", or if we really want to be
precise, "replication.policy.heartbeats.topic.name".

Regarding multiple source->target pairs, won't we get support for this for
free if we add the new property to the DefaultReplicationPolicy class? IIRC
it's already possible to configure replication policies on a
per-replication-flow basis with that syntax, I don't see why this wouldn't
be the case for the new property.

I'm also a little hazy on the motivation for the change. Just out of
curiosity, what exactly is meant by "the "heartbeats" topics of other
systems" in the Jira ticket's description? Are we trying to better
accommodate cases where other harder-to-configure systems (like a picky
source connector, for example) create and use a "heartbeats" topic, or are
we trying to enable multiple MM2 heartbeat connectors to target the same
Kafka cluster? I can understand the former as a niche but possible scenario
and one that we can make room for, but the latter is a bit harder to
justify short of, e.g., fine-tuning the heartbeat emission interval based
on the eventual target of the replication flow that will be reading from
the heartbeats topic.

I don't raise the above to cast doubt on the KIP, really I'm just curious
about how people are using MM2.

Cheers,

Chris

On Thu, Jan 18, 2024 at 6:11 AM Kondrát Bertalan  wrote:

> Hi Viktor,
>
> Let me address your points one by one.
>
>1. The current implementation does not support the source->target pair
>based configuration, it is global.
>2. Yes, I introduced that property both in the client and in the
>connectors
>3. This is a great idea, I am going to do that, and also I tried to
>construct the property name in a way that makes this clear for the
> users: '
>default.replication.policy.heartbeats.topic.name'
>4. Yeah, that was my impression too.
>
> Thanks,
> Berci
>
> On Wed, Jan 17, 2024 at 4:51 PM Viktor Somogyi-Vass
>  wrote:
>
> > Hi Bertalan,
> >
> > Thanks for creating this KIP.
> > A couple of observations/questions:
> > 1. If I have multiple source->target pairs, can I set this property per
> > cluster by prefixing with "source->target" as many other configs or is it
> > global?
> > 2. The replication policy must be set in MirrorClient as well. Is your
> > change applicable to both MirrorClient and the connectors as well?
> > 3. It might be worth pointing out (both in the docs and the KIP) that if
> > the user overrides the replication policy to any other than
> > DefaultReplicationPolicy, then this config has no effect.
> > 4. With regards to integration tests, I tend to lean towards that we
> don't
> > need them if we can cover this well with unit tests and mocking.
> >
> > Thanks,
> > Viktor
> >
> > On Wed, Jan 17, 2024 at 12:23 AM Ryanne Dolan 
> > wrote:
> >
> > > Makes sense to me, +1.
> > >
> > > On Tue, Jan 16, 2024 at 5:04 PM Kondrát Bertalan 
> > > wrote:
> > >
> > >> Hey Team,
> > >>
> > >> I would like to start a discussion thread about the *KIP-1016 Make MM2
> > >> heartbeats topic name configurable
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1016+Make+MM2+heartbeats+topic+name+configurable
> > >> >*
> > >> .
> > >>
> > >> This KIP aims to make the default heartbeat topic name (`heartbeats`)
> in
> > >> the DefaultReplicationPolicy configurable via a property.
> > >> Since this is my first KIP and the change is small, I implemented it
> in
> > >> advance so, I can include the PR
> > >>  as well.
> > >>
> > >> I appreciate all your feedbacks and comments.
> > >>
> > >> Special thanks to Viktor Somogyi-Vass 
> and
> > >> Daniel
> > >> Urban  for the original idea and help.
> > >> Thank you,
> > >> Berci
> > >>
> > >> --
> > >> *Bertalan Kondrat* | Founder, SWE
> > >> servy.hu 
> > >>
> > >>
> > >>
> > >> 
> > >> --
> > >>
> > >
> >
>
>
> --
> *Bertalan Kondrat* | Founder
> t. +36(70) 413-4801
> servy.hu 
>
>
> [image: Servy] 
> --
>


Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-22 Thread Justine Olshan
101.3 I realized that I actually have two questions.
> (1) In the non-overflow case, we need to write the previous produce Id
tagged field in the end maker so that we know if the marker is from the new
client. Since the end maker is derived from the prepare marker, should we
write the previous produce Id in the prepare marker field too? Otherwise,
we will lose this information when deriving the end marker.

The "previous" producer ID is in the normal producer ID field. So yes, we
need it in prepare and that was always the plan.

Maybe it is a bit unclear so I will enumerate the fields and add them to
the KIP if that helps.
Say we have producer ID x and epoch y. When we overflow epoch y we get
producer ID Z.

PREPARE
producerId: x
previous/lastProducerId (tagged field): empty
nextProducerId (tagged field): empty or z if y will overflow
producerEpoch: y + 1

COMPLETE
producerId: x or z if y overflowed
previous/lastProducerId (tagged field): x
nextProducerId (tagged field): empty
producerEpoch: y + 1 or 0 if we overflowed

(2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
fields and are issuing the same command (ie commit not abort), we return
success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
in this case. Are we intentionally making this change?

Hmm -- we would fence the producer if the epoch is bumped and we get a
lower epoch. Yes -- we are intentionally adding this to prevent fencing.


112. We already merged the code that adds the VerifyOnly field in
AddPartitionsToTxnRequest, which is an inter broker request. It seems that
we didn't bump up the IBP for that. Do you know why?

We no longer need IBP for all interbroker requests as ApiVersions should
correctly gate versioning.
We also handle unsupported version errors correctly if we receive them in
edge cases like upgrades/downgrades.

Justine

On Mon, Jan 22, 2024 at 11:00 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> 101.3 I realized that I actually have two questions.
> (1) In the non-overflow case, we need to write the previous produce Id
> tagged field in the end maker so that we know if the marker is from the new
> client. Since the end maker is derived from the prepare marker, should we
> write the previous produce Id in the prepare marker field too? Otherwise,
> we will lose this information when deriving the end marker.
> (2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
> fields and are issuing the same command (ie commit not abort), we return
> success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
> in this case. Are we intentionally making this change?
>
> 112. We already merged the code that adds the VerifyOnly field in
> AddPartitionsToTxnRequest, which is an inter broker request. It seems that
> we didn't bump up the IBP for that. Do you know why?
>
> Jun
>
> On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan
> 
> wrote:
>
> > Hi Jun,
> >
> > 101.3 I can change "last seen" to "current producer id and epoch" if that
> > was the part that was confusing
> > 110 I can mention this
> > 111 I can do that
> > 112 We still need it. But I am still finalizing the design. I will update
> > the KIP once I get the information finalized. Sorry for the delays.
> >
> > Justine
> >
> > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply.
> > >
> > > 101.3 In the non-overflow case, the previous ID is the same as the
> > produce
> > > ID for the complete marker too, but we set the previous ID in the
> > complete
> > > marker. Earlier you mentioned that this is to know that the marker is
> > > written by the new client so that we could return success on retried
> > > endMarker requests. I was trying to understand why this is not needed
> for
> > > the prepare marker since retry can happen in the prepare state too. Is
> > the
> > > reason that in the prepare state, we return CONCURRENT_TRANSACTIONS
> > instead
> > > of success on retried endMaker requests? If so, should we change "If we
> > > retry and see epoch - 1 + ID in last seen fields and are issuing the
> same
> > > command (ie commit not abort) we can return (with the new epoch)"
> > > accordingly?
> > >
> > > 110. Yes, without this KIP, a delayed endMaker request carries the same
> > > epoch and won't be fenced. This can commit/abort a future transaction
> > > unexpectedly. I am not sure if we have seen this in practice though.
> > >
> > > 111. Sounds good. It would be useful to make it clear that we can now
> > > populate the lastSeen field from the log reliably.
> > >
> > > 112. Yes, I was referring to AddPartitionsToTxnRequest since it's
> called
> > > across brokers and we are changing its schema. Are you saying we don't
> > need
> > > it any more? I thought that we already implemented the server side
> > > verification logic based on AddPartitionsToTxnRequest across brokers.
> > >
> > > Jun
> > >
> > >
> > > On Thu, Jan 18, 2024 at 5:05 

[jira] [Resolved] (KAFKA-16137) ListClientMetricsResourcesResponse definition is missing field descriptions

2024-01-22 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-16137.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

merged the PR to trunk.

> ListClientMetricsResourcesResponse definition is missing field descriptions
> ---
>
> Key: KAFKA-16137
> URL: https://issues.apache.org/jira/browse/KAFKA-16137
> Project: Kafka
>  Issue Type: Task
>  Components: admin
>Affects Versions: 3.7.0
>Reporter: Andrew Schofield
>Assignee: Andrew Schofield
>Priority: Trivial
> Fix For: 3.8.0
>
>
> This is purely improving the readability of the Kafka protocol documentation 
> by adding missing description information for the fields of the 
> `ListClientMetricsResources` response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-890 Server Side Defense

2024-01-22 Thread Jun Rao
Hi, Justine,

Thanks for the reply.

101.3 I realized that I actually have two questions.
(1) In the non-overflow case, we need to write the previous produce Id
tagged field in the end maker so that we know if the marker is from the new
client. Since the end maker is derived from the prepare marker, should we
write the previous produce Id in the prepare marker field too? Otherwise,
we will lose this information when deriving the end marker.
(2) In the prepare phase, if we retry and see epoch - 1 + ID in last seen
fields and are issuing the same command (ie commit not abort), we return
success. The logic before KIP-890 seems to return CONCURRENT_TRANSACTIONS
in this case. Are we intentionally making this change?

112. We already merged the code that adds the VerifyOnly field in
AddPartitionsToTxnRequest, which is an inter broker request. It seems that
we didn't bump up the IBP for that. Do you know why?

Jun

On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan 
wrote:

> Hi Jun,
>
> 101.3 I can change "last seen" to "current producer id and epoch" if that
> was the part that was confusing
> 110 I can mention this
> 111 I can do that
> 112 We still need it. But I am still finalizing the design. I will update
> the KIP once I get the information finalized. Sorry for the delays.
>
> Justine
>
> On Fri, Jan 19, 2024 at 10:50 AM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > 101.3 In the non-overflow case, the previous ID is the same as the
> produce
> > ID for the complete marker too, but we set the previous ID in the
> complete
> > marker. Earlier you mentioned that this is to know that the marker is
> > written by the new client so that we could return success on retried
> > endMarker requests. I was trying to understand why this is not needed for
> > the prepare marker since retry can happen in the prepare state too. Is
> the
> > reason that in the prepare state, we return CONCURRENT_TRANSACTIONS
> instead
> > of success on retried endMaker requests? If so, should we change "If we
> > retry and see epoch - 1 + ID in last seen fields and are issuing the same
> > command (ie commit not abort) we can return (with the new epoch)"
> > accordingly?
> >
> > 110. Yes, without this KIP, a delayed endMaker request carries the same
> > epoch and won't be fenced. This can commit/abort a future transaction
> > unexpectedly. I am not sure if we have seen this in practice though.
> >
> > 111. Sounds good. It would be useful to make it clear that we can now
> > populate the lastSeen field from the log reliably.
> >
> > 112. Yes, I was referring to AddPartitionsToTxnRequest since it's called
> > across brokers and we are changing its schema. Are you saying we don't
> need
> > it any more? I thought that we already implemented the server side
> > verification logic based on AddPartitionsToTxnRequest across brokers.
> >
> > Jun
> >
> >
> > On Thu, Jan 18, 2024 at 5:05 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hey Jun,
> > >
> > > 101.3 We don't set the previous ID in the Prepare field since we don't
> > need
> > > it. It is the same producer ID as the main producer ID field.
> > >
> > > 110 Hmm -- maybe I need to reread your message about delayed markers.
> If
> > we
> > > receive a delayed endTxn marker after the transaction is already
> > complete?
> > > So we will commit the next transaction early without the fixes in part
> 2?
> > >
> > > 111 Yes -- this terminology was used in a previous KIP and never
> > > implemented it in the log -- only in memory
> > >
> > > 112 Hmm -- which interbroker protocol are you referring to? I am
> working
> > on
> > > the design for the work to remove the extra add partitions call and I
> > right
> > > now the design bumps MV. I have yet to update that section as I
> finalize
> > > the design so please stay tuned. Was there anything else you thought
> > needed
> > > MV bump?
> > >
> > > Justine
> > >
> > > On Thu, Jan 18, 2024 at 3:07 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > I don't see this create any issue. It just makes it a bit hard to
> > explain
> > > > what this non-tagged produce id field means. We are essentially
> trying
> > to
> > > > combine two actions (completing a txn and init a new produce Id) in a
> > > > single record. But, this may be fine too.
> > > >
> > > > A few other follow up comments.
> > > >
> > > > 101.3 I guess the reason that we only set the previous produce id
> > tagged
> > > > field in the complete marker, but not in the prepare marker, is that
> in
> > > the
> > > > prepare state, we always return CONCURRENT_TRANSACTIONS on retried
> > > endMaker
> > > > requests?
> > > >
> > > > 110. "I believe your second point is mentioned in the KIP. I can add
> > more
> > > > text on
> > > > this if it is helpful.
> > > > > The delayed message case can also violate EOS if the delayed
> message
> > > > comes in after the next addPartitionsToTxn request comes in.
> > Effectively
> > > we
> > > > may see a message from a 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2594

2024-01-22 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16185) Fix client reconciliation of same assignment received in different epochs

2024-01-22 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-16185:
--

 Summary: Fix client reconciliation of same assignment received in 
different epochs 
 Key: KAFKA-16185
 URL: https://issues.apache.org/jira/browse/KAFKA-16185
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Lianet Magrans
Assignee: Lianet Magrans


Currently, the intention in the client state machine is that the client always 
reconciles whatever it has pending that has not been removed by the coordinator.

There is still an edge case where this does not happen, and the client might 
get stuck JOINING/RECONCILING, with a pending reconciliation (delayed), and it 
receives the same assignment, but in a new epoch (ex. after being FENCED). 
First time it receives the assignment it takes no action, as it already has it 
as pending to reconcile, but when the reconciliation completes it discards the 
result because the epoch changed. And this is wrong. Note that after sending 
the assignment with the new epoch one time, the broker continues to send null 
assignments. 

Here is a sample sequence leading to the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- new member added to the group (group epoch bumped to 3), client receives same 
assignment that is currently trying to reconcile (tp1), but with epoch 3
- previous reconciliation completes, but will discard the result because it 
will notice that the memberHasRejoined (memberEpochOnReconciliationStart != 
memberEpoch). Client is stuck JOINING, with the server sending null target 
assignment because it hasn't changed since the last one sent (tp1)

(We should end up with a test similar to the existing 
#testDelayedReconciliationResultDiscardedIfMemberRejoins but with the case that 
the member receives the same assignment after being fenced and rejoining)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread Chris Egerton
Hi David,

I've only briefly skimmed KAFKA-10140 but it seems that it may only apply
to append/subtract operations on list-type properties. If my understanding
is correct then this shouldn't be a problem for the KIP since we only use
the set/delete operations in the kafka-configs.sh script. If the scope of
the issue extends beyond those operations, then I agree that changes are
warranted to the KIP.

Cheers,

Chris

On Mon, Jan 22, 2024 at 5:23 AM David Jacot 
wrote:

> Hi Ziming,
>
> Thanks for driving this. I wanted to bring KAFKA-10140
>  to your attention. It
> looks like the incremental API does not work for configuring plugins. I
> think that we need to cover this in the KIP.
>
> Best,
> David
>
> On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
> andrew_schofield_j...@outlook.com> wrote:
>
> > +1 (non-binding)
> >
> > Thanks,
> > Andrew
> >
> > > On 22 Jan 2024, at 07:29, Federico Valeri 
> wrote:
> > >
> > > +1 (non binding)
> > >
> > > Thanks.
> > >
> > > On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> > >>
> > >> Hi Ziming,
> > >>
> > >> +1(binding) from me.
> > >>
> > >> Thanks.
> > >> Luke
> > >>
> > >> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
> > >> kamal.chandraprak...@gmail.com> wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng <
> dengziming1...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hello everyone,
> >  I'd like to initiate a vote for KIP-1011.
> >  This KIP is about replacing alterConfigs with
> incrementalAlterConfigs
> >  when updating broker configs using kafka-configs.sh, this is similar
> > to
> >  what we have done in KIP-894.
> > 
> >  KIP link:
> >  KIP-1011: Use incrementalAlterConfigs when updating broker configs
> by
> >  kafka-configs.sh - Apache Kafka - Apache Software Foundation
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >
> >  cwiki.apache.org
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >
> >  [image: favicon.ico]
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >
> >  <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> > >
> > 
> >  Discussion thread:
> > 
> > 
> >  lists.apache.org
> >  
> >  
> >  
> > 
> > 
> >  --,
> >  Best,
> >  Ziming
> >
> >
> >
>


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2593

2024-01-22 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16184) Flaky test - testAlterReplicaLogDirs(String).quorum=kraft – kafka.api.PlaintextAdminIntegrationTest

2024-01-22 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16184:


 Summary: Flaky test - testAlterReplicaLogDirs(String).quorum=kraft 
– kafka.api.PlaintextAdminIntegrationTest
 Key: KAFKA-16184
 URL: https://issues.apache.org/jira/browse/KAFKA-16184
 Project: Kafka
  Issue Type: Bug
Reporter: Andrew Schofield


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15214/1/tests/]
h4. Error
org.opentest4j.AssertionFailedError: timed out waiting for replica movement
h4. Stacktrace
org.opentest4j.AssertionFailedError: timed out waiting for replica movement
 at org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38)
 at org.junit.jupiter.api.Assertions.fail(Assertions.java:134)
 at 
kafka.api.PlaintextAdminIntegrationTest.$anonfun$testAlterReplicaLogDirs$7(PlaintextAdminIntegrationTest.scala:317)
 at 
kafka.api.PlaintextAdminIntegrationTest.$anonfun$testAlterReplicaLogDirs$7$adapted(PlaintextAdminIntegrationTest.scala:316)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
 at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
 at 
kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs(PlaintextAdminIntegrationTest.scala:316)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
 at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
 at 
org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
 at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
 at 

[jira] [Created] (KAFKA-16183) Flaky test - testMetricsDuringTopicCreateDelete(String).quorum=zk – kafka.integration.MetricsDuringTopicCreationDeletionTest

2024-01-22 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16183:


 Summary: Flaky test - 
testMetricsDuringTopicCreateDelete(String).quorum=zk – 
kafka.integration.MetricsDuringTopicCreationDeletionTest
 Key: KAFKA-16183
 URL: https://issues.apache.org/jira/browse/KAFKA-16183
 Project: Kafka
  Issue Type: Bug
Reporter: Andrew Schofield


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15214/1/pipeline]
h4. Error
java.lang.AssertionError: assertion failed: Expect 
UnderReplicatedPartitionCount to be 0, but got: 1
h4. Stacktrace
java.lang.AssertionError: assertion failed: Expect 
UnderReplicatedPartitionCount to be 0, but got: 1
 at scala.Predef$.assert(Predef.scala:279)
 at 
kafka.integration.MetricsDuringTopicCreationDeletionTest.testMetricsDuringTopicCreateDelete(MetricsDuringTopicCreationDeletionTest.scala:123)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566)
 at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
 at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
 at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
 at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
 at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask$DefaultDynamicTestExecutor.execute(NodeTestTask.java:226)
 at 
org.junit.platform.engine.support.hierarchical.NodeTestTask$DefaultDynamicTestExecutor.execute(NodeTestTask.java:204)
 at 
org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.execute(TestTemplateTestDescriptor.java:142)
 at 
org.junit.jupiter.engine.descriptor.TestTemplateTestDescriptor.lambda$execute$2(TestTemplateTestDescriptor.java:110)
 at 

[jira] [Created] (KAFKA-16182) Flaky test - testClientInstanceId() - org.apache.kafka.clients.admin.KafkaAdminClientTest

2024-01-22 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16182:


 Summary: Flaky test - testClientInstanceId() - 
org.apache.kafka.clients.admin.KafkaAdminClientTest
 Key: KAFKA-16182
 URL: https://issues.apache.org/jira/browse/KAFKA-16182
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 3.7.0
Reporter: Andrew Schofield


h3. Error

org.apache.kafka.common.KafkaException: Error occurred while fetching client 
instance id
Stacktrace
org.apache.kafka.common.KafkaException: Error occurred while fetching client 
instance id
at 
app//org.apache.kafka.clients.admin.KafkaAdminClient.clientInstanceId(KafkaAdminClient.java:4477)
at 
app//org.apache.kafka.clients.admin.KafkaAdminClientTest.testClientInstanceId(KafkaAdminClientTest.java:7082)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at java.base@17.0.7/java.util.ArrayList.forEach(ArrayList.java:1511)
at 
app//org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 

Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread David Jacot
Hi Ziming,

Thanks for driving this. I wanted to bring KAFKA-10140
 to your attention. It
looks like the incremental API does not work for configuring plugins. I
think that we need to cover this in the KIP.

Best,
David

On Mon, Jan 22, 2024 at 10:13 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> +1 (non-binding)
>
> Thanks,
> Andrew
>
> > On 22 Jan 2024, at 07:29, Federico Valeri  wrote:
> >
> > +1 (non binding)
> >
> > Thanks.
> >
> > On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
> >>
> >> Hi Ziming,
> >>
> >> +1(binding) from me.
> >>
> >> Thanks.
> >> Luke
> >>
> >> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
> >> kamal.chandraprak...@gmail.com> wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng 
> >>> wrote:
> >>>
>  Hello everyone,
>  I'd like to initiate a vote for KIP-1011.
>  This KIP is about replacing alterConfigs with incrementalAlterConfigs
>  when updating broker configs using kafka-configs.sh, this is similar
> to
>  what we have done in KIP-894.
> 
>  KIP link:
>  KIP-1011: Use incrementalAlterConfigs when updating broker configs by
>  kafka-configs.sh - Apache Kafka - Apache Software Foundation
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  cwiki.apache.org
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  [image: favicon.ico]
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
>  <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1011%3A+Use+incrementalAlterConfigs+when+updating+broker+configs+by+kafka-configs.sh
> >
> 
>  Discussion thread:
> 
> 
>  lists.apache.org
>  
>  
>  
> 
> 
>  --,
>  Best,
>  Ziming
>
>
>


[jira] [Resolved] (KAFKA-16147) Partition is assigned to two members at the same time

2024-01-22 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16147.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Partition is assigned to two members at the same time
> -
>
> Key: KAFKA-16147
> URL: https://issues.apache.org/jira/browse/KAFKA-16147
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Emanuele Sabellico
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
> Attachments: broker1.log, broker2.log, broker3.log, librdkafka.log, 
> server.properties, server1.properties, server2.properties
>
>
> While running [test 0113 of 
> librdkafka|https://github.com/confluentinc/librdkafka/blob/8b6357f872efe2a5a3a2fd2828e4133f85e6b023/tests/0113-cooperative_rebalance.cpp#L2384],
>  subtest _u_multiple_subscription_changes_ have received this error saying 
> that a partition is assigned to two members at the same time.
> {code:java}
> Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] 
> which is already assigned to consumer C_5#consumer-8 {code}
> I've reconstructed this sequence:
> C_5 SUBSCRIBES TO T1
> {noformat}
> %7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id 
> "(null)", current assignment "", subscribe topics 
> "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{noformat}
> C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12
> {noformat}
> [2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7] [GroupId 
> rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw 
> transitioned from CurrentAssignment(memberEpoch=6, previousMemberEpoch=0, 
> targetMemberEpoch=6, state=assigning, assignedPartitions={}, 
> partitionsPendingRevocation={}, 
> partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to 
> CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
> targetMemberEpoch=14, state=stable, 
> assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
> partitionsPendingRevocation={}, partitionsPendingAssignment={}). 
> (org.apache.kafka.coordinator.group.GroupMetadataManager){noformat}
>  
> C_5 RECEIVES TARGET ASSIGNMENT
> {noformat}
> %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat response received target assignment 
> "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat}
>  
> C_5 ACKS TARGET ASSIGNMENT
> {noformat}
> %7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
> "NULL", current assignment 
> "rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", 
> subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
> %7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat response received target assignment 
> "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], 
> (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat}
>  
> C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are 
> pending 
> {noformat}
> %7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: 
> Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id 
> "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id 
> "NULL", current assignment "NULL", subscribe topics 
> "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"
> [2024-01-16 12:10:52,615] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7] [GroupId 
> rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated 
> its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, 
> rdkafkatest_rnd5a91902462d61c2e_0113u_1]. 
> (org.apache.kafka.coordinator.group.GroupMetadataManager)
> [2024-01-16 12:10:52,616] INFO [GroupCoordinator id=1 
> topic=__consumer_offsets partition=7] [GroupId 
> rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw 
> transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
> targetMemberEpoch=14, state=stable, 
> assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, 
> partitionsPendingRevocation={}, partitionsPendingAssignment={}) to 
> CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, 
> targetMemberEpoch=16, state=revoking, 

Re: [VOTE] KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-22 Thread Andrew Schofield
+1 (non-binding)

Thanks,
Andrew

> On 22 Jan 2024, at 07:29, Federico Valeri  wrote:
>
> +1 (non binding)
>
> Thanks.
>
> On Mon, Jan 22, 2024 at 7:03 AM Luke Chen  wrote:
>>
>> Hi Ziming,
>>
>> +1(binding) from me.
>>
>> Thanks.
>> Luke
>>
>> On Mon, Jan 22, 2024 at 11:50 AM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Jan 22, 2024 at 8:34 AM ziming deng 
>>> wrote:
>>>
 Hello everyone,
 I'd like to initiate a vote for KIP-1011.
 This KIP is about replacing alterConfigs with incrementalAlterConfigs
 when updating broker configs using kafka-configs.sh, this is similar to
 what we have done in KIP-894.

 KIP link:
 KIP-1011: Use incrementalAlterConfigs when updating broker configs by
 kafka-configs.sh - Apache Kafka - Apache Software Foundation
 
 cwiki.apache.org
 
 [image: favicon.ico]
 
 

 Discussion thread:


 lists.apache.org
 
 
 


 --,
 Best,
 Ziming