Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2622

2024-02-05 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Artem Livshits
Hi Rowland,

Thank you for your reply.  I think I understand what you're saying and just
tried to provide a quick summary.  The
https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
actually goes into the details on what would be the benefits of adding an
explicit prepare RPC and why those won't really add any advantages such as
elimination the needs for monitoring, tooling or providing additional
guarantees.  Let me know if you think of a guarantee that prepare RPC would
provide.

-Artem

On Mon, Feb 5, 2024 at 6:22 PM Rowland Smith  wrote:

> Hi Artem,
>
> I don't think that you understand what I am saying. In any transaction,
> there is work done before the call to prepareTranscation() and work done
> afterwards. Any work performed before the call to prepareTransaction() can
> be aborted after a relatively short timeout if the client fails. It is only
> after the prepareTransaction() call that a transaction becomes in-doubt and
> must be remembered for a much longer period of time to allow the client to
> recover and make the decision to either commit or abort. A considerable
> amount of time might be spent before prepareTransaction() is called, and if
> the client fails in this period, relatively quick transaction abort would
> unblock any partitions and make the system fully available. So a prepare
> RPC would reduce the window where a client failure results in potentially
> long-lived blocking transactions.
>
> Here is the proposed sequence from the KIP with 2 added steps (4 and 5):
>
>
>1. Begin database transaction
>2. Begin Kafka transaction
>3. Produce data to Kafka
>4. Make updates to the database
>5. Repeat steps 3 and 4 as many times as necessary based on application
>needs.
>6. Prepare Kafka transaction [currently implicit operation, expressed as
>flush]
>7. Write produced data to the database
>8. Write offsets of produced data to the database
>9. Commit database transaction
>10. Commit Kafka transaction
>
>
> If the client application crashes before step 6, it is safe to abort the
> Kafka transaction after a relatively short timeout.
>
> I fully agree with a layered approach. However, the XA layer is going to
> require certain capabilities from the layer below it, and one of those
> capabilities is to be able to identify and report prepared transactions
> during recovery.
>
> - Rowland
>
> On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
>  wrote:
>
> > Hi Rowland,
> >
> > Thank you for your feedback.  Using an explicit prepare RPC was discussed
> > and is listed in the rejected alternatives:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > .
> > Basically, even if we had an explicit prepare RPC, it doesn't avoid the
> > fact that a crashed client could cause a blocking transaction.  This is,
> > btw, is not just a specific property of this concrete proposal, it's a
> > fundamental trade off of any form of 2PC -- any 2PC implementation must
> > allow for infinitely "in-doubt" transactions that may not be unilaterally
> > automatically resolved within one participant.
> >
> > To mitigate the issue, using 2PC requires a special permission, so that
> the
> > Kafka admin could control that only applications that follow proper
> > standards in terms of availability (i.e. will automatically restart and
> > cleanup after a crash) would be allowed to utilize 2PC.  It is also
> assumed
> > that any practical deployment utilizing 2PC would have monitoring set up,
> > so that an operator could be alerted to investigate and manually resolve
> > in-doubt transactions (the metric and tooling support for doing so are
> also
> > described in the KIP).
> >
> > For XA support, I wonder if we could take a layered approach and store XA
> > information in a separate store, say in a compacted topic.  This way, the
> > core Kafka protocol could be decoupled from specific implementations (and
> > extra performance requirements that a specific implementation may impose)
> > and serve as a foundation for multiple implementations.
> >
> > -Artem
> >
> > On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith  wrote:
> >
> > > Hi Artem,
> > >
> > > It has been a while, but I have gotten back to this. I understand that
> > when
> > > 2PC is used, the transaction timeout will be effectively infinite. I
> > don't
> > > think that this behavior is desirable. A long running transaction can
> be
> > > extremely disruptive since it blocks consumers on any partitions
> written
> > to
> > > within the pending transaction. The primary reason for a long running
> > > transaction is a failure of the client, or the network connecting the
> > > client to the broker. If such a failure occurs before the client calls
> > > the new prepareTransaction() method, it should be 

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Rowland Smith
Hi Artem,

I don't think that you understand what I am saying. In any transaction,
there is work done before the call to prepareTranscation() and work done
afterwards. Any work performed before the call to prepareTransaction() can
be aborted after a relatively short timeout if the client fails. It is only
after the prepareTransaction() call that a transaction becomes in-doubt and
must be remembered for a much longer period of time to allow the client to
recover and make the decision to either commit or abort. A considerable
amount of time might be spent before prepareTransaction() is called, and if
the client fails in this period, relatively quick transaction abort would
unblock any partitions and make the system fully available. So a prepare
RPC would reduce the window where a client failure results in potentially
long-lived blocking transactions.

Here is the proposed sequence from the KIP with 2 added steps (4 and 5):


   1. Begin database transaction
   2. Begin Kafka transaction
   3. Produce data to Kafka
   4. Make updates to the database
   5. Repeat steps 3 and 4 as many times as necessary based on application
   needs.
   6. Prepare Kafka transaction [currently implicit operation, expressed as
   flush]
   7. Write produced data to the database
   8. Write offsets of produced data to the database
   9. Commit database transaction
   10. Commit Kafka transaction


If the client application crashes before step 6, it is safe to abort the
Kafka transaction after a relatively short timeout.

I fully agree with a layered approach. However, the XA layer is going to
require certain capabilities from the layer below it, and one of those
capabilities is to be able to identify and report prepared transactions
during recovery.

- Rowland

On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
 wrote:

> Hi Rowland,
>
> Thank you for your feedback.  Using an explicit prepare RPC was discussed
> and is listed in the rejected alternatives:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> .
> Basically, even if we had an explicit prepare RPC, it doesn't avoid the
> fact that a crashed client could cause a blocking transaction.  This is,
> btw, is not just a specific property of this concrete proposal, it's a
> fundamental trade off of any form of 2PC -- any 2PC implementation must
> allow for infinitely "in-doubt" transactions that may not be unilaterally
> automatically resolved within one participant.
>
> To mitigate the issue, using 2PC requires a special permission, so that the
> Kafka admin could control that only applications that follow proper
> standards in terms of availability (i.e. will automatically restart and
> cleanup after a crash) would be allowed to utilize 2PC.  It is also assumed
> that any practical deployment utilizing 2PC would have monitoring set up,
> so that an operator could be alerted to investigate and manually resolve
> in-doubt transactions (the metric and tooling support for doing so are also
> described in the KIP).
>
> For XA support, I wonder if we could take a layered approach and store XA
> information in a separate store, say in a compacted topic.  This way, the
> core Kafka protocol could be decoupled from specific implementations (and
> extra performance requirements that a specific implementation may impose)
> and serve as a foundation for multiple implementations.
>
> -Artem
>
> On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith  wrote:
>
> > Hi Artem,
> >
> > It has been a while, but I have gotten back to this. I understand that
> when
> > 2PC is used, the transaction timeout will be effectively infinite. I
> don't
> > think that this behavior is desirable. A long running transaction can be
> > extremely disruptive since it blocks consumers on any partitions written
> to
> > within the pending transaction. The primary reason for a long running
> > transaction is a failure of the client, or the network connecting the
> > client to the broker. If such a failure occurs before the client calls
> > the new prepareTransaction() method, it should be OK to abort the
> > transaction after a relatively short timeout period. This approach would
> > minimize the inconvenience and disruption of a long running transaction
> > blocking consumers, and provide higher availability for a system using
> > Kafka.
> >
> > In order to achieve this behavior, I think we would need a 'prepare' RPC
> > call so that the server knows that a transaction has been prepared, and
> > does not timeout and abort such transactions. There will be some cost to
> > this extra RPC call, but there will also be a benefit of better system
> > availability in case of failures.
> >
> > There is another reason why I would prefer this implementation. I am
> > working on an XA KIP, and XA requires that Kafka brokers be able to
> provide
> > a list of prepared transactions during recovery.  The broker can only
> 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-02-05 Thread Jun Rao
Hi, Justine,

Thanks for the reply.

Since AddPartitions is an inter broker request, will its version be gated
only by TV or other features like MV too? For example, if we need to change
the protocol for AddPartitions for reasons other than txn verification in
the future, will the new version be gated by a new MV? If so, does
downgrading a TV imply potential downgrade of MV too?

Jun



On Mon, Feb 5, 2024 at 5:07 PM Justine Olshan 
wrote:

> One TV gates the flexible feature version (no rpcs involved, only the
> transactional records that should only be gated by TV)
> Another TV gates the ability to turn on kip-890 part 2. This would gate the
> version of Produce and EndTxn (likely only used by transactions), and
> specifies a flag in AddPartitionsToTxn though the version is already used
> without TV.
>
> I think the only concern is the Produce request and we could consider work
> arounds similar to the AddPartitionsToTxn call.
>
> Justine
>
> On Mon, Feb 5, 2024 at 4:56 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Which PRC/record protocols will TV guard? Going forward, will those
> > PRC/record protocols only be guarded by TV and not by other features like
> > MV?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan
>  > >
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Sorry I think I misunderstood your question or answered incorrectly.
> The
> > TV
> > > version should ideally be fully independent from MV.
> > > At least for the changes I proposed, TV should not affect MV and MV
> > should
> > > not affect TV/
> > >
> > > I think if we downgrade TV, only that feature should downgrade.
> Likewise
> > > the same with MV. The finalizedFeatures should just reflect the feature
> > > downgrade we made.
> > >
> > > I also plan to write a new KIP for managing the disk format and upgrade
> > > tool as we will need new flags to support these features. That should
> > help
> > > clarify some things.
> > >
> > > Justine
> > >
> > > On Mon, Feb 5, 2024 at 11:03 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > So, if we downgrade TV, we could implicitly downgrade another feature
> > > (say
> > > > MV) that has dependency (e.g. RPC). What would we return for
> > > > FinalizedFeatures for MV in ApiVersionsResponse in that case?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan
> > >  > > > >
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > Yes, the idea is that if we downgrade TV (transaction version) we
> > will
> > > > stop
> > > > > using the add partitions to txn optimization and stop writing the
> > > > flexible
> > > > > feature version of the log.
> > > > > In the compatibility section I included some explanations on how
> this
> > > is
> > > > > done.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao 
> > > > wrote:
> > > > >
> > > > > > Hi, Justine,
> > > > > >
> > > > > > Thanks for the update.
> > > > > >
> > > > > > If we ever downgrade the transaction feature, any feature
> depending
> > > on
> > > > > > changes on top of those RPC/record
> > > > > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in
> > > KIP-890
> > > > > > will be automatically downgraded too?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jun,
> > > > > > >
> > > > > > > I wanted to get back to you about your questions about MV/IBP.
> > > > > > >
> > > > > > > Looking at the options, I think it makes the most sense to
> > create a
> > > > > > > separate feature for transactions and use that to version gate
> > the
> > > > > > features
> > > > > > > we need to version gate (flexible transactional state records
> and
> > > > using
> > > > > > the
> > > > > > > new protocol)
> > > > > > > I've updated the KIP to include this change. Hopefully that's
> > > > > everything
> > > > > > we
> > > > > > > need for this KIP :)
> > > > > > >
> > > > > > > Justine
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan <
> > > jols...@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Jun,
> > > > > > > >
> > > > > > > > I will update the KIP with the prev field for prepare as
> well.
> > > > > > > >
> > > > > > > > PREPARE
> > > > > > > > producerId: x
> > > > > > > > previous/lastProducerId (tagged field): x
> > > > > > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > > > > > producerEpoch: y + 1
> > > > > > > >
> > > > > > > > COMPLETE
> > > > > > > > producerId: x or z if y overflowed
> > > > > > > > previous/lastProducerId (tagged field): x
> > > > > > > > nextProducerId (tagged field): empty
> > > > > > > > producerEpoch: y + 1 or 0 if we overflowed
> > > > > > > >
> > > > > > > > Thanks again,
> > > > > > > > Justine
> > > > > > 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-02-05 Thread Justine Olshan
One TV gates the flexible feature version (no rpcs involved, only the
transactional records that should only be gated by TV)
Another TV gates the ability to turn on kip-890 part 2. This would gate the
version of Produce and EndTxn (likely only used by transactions), and
specifies a flag in AddPartitionsToTxn though the version is already used
without TV.

I think the only concern is the Produce request and we could consider work
arounds similar to the AddPartitionsToTxn call.

Justine

On Mon, Feb 5, 2024 at 4:56 PM Jun Rao  wrote:

> Hi, Justine,
>
> Which PRC/record protocols will TV guard? Going forward, will those
> PRC/record protocols only be guarded by TV and not by other features like
> MV?
>
> Thanks,
>
> Jun
>
> On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan  >
> wrote:
>
> > Hi Jun,
> >
> > Sorry I think I misunderstood your question or answered incorrectly. The
> TV
> > version should ideally be fully independent from MV.
> > At least for the changes I proposed, TV should not affect MV and MV
> should
> > not affect TV/
> >
> > I think if we downgrade TV, only that feature should downgrade. Likewise
> > the same with MV. The finalizedFeatures should just reflect the feature
> > downgrade we made.
> >
> > I also plan to write a new KIP for managing the disk format and upgrade
> > tool as we will need new flags to support these features. That should
> help
> > clarify some things.
> >
> > Justine
> >
> > On Mon, Feb 5, 2024 at 11:03 AM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the reply.
> > >
> > > So, if we downgrade TV, we could implicitly downgrade another feature
> > (say
> > > MV) that has dependency (e.g. RPC). What would we return for
> > > FinalizedFeatures for MV in ApiVersionsResponse in that case?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan
> >  > > >
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > Yes, the idea is that if we downgrade TV (transaction version) we
> will
> > > stop
> > > > using the add partitions to txn optimization and stop writing the
> > > flexible
> > > > feature version of the log.
> > > > In the compatibility section I included some explanations on how this
> > is
> > > > done.
> > > >
> > > > Thanks,
> > > > Justine
> > > >
> > > > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Justine,
> > > > >
> > > > > Thanks for the update.
> > > > >
> > > > > If we ever downgrade the transaction feature, any feature depending
> > on
> > > > > changes on top of those RPC/record
> > > > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in
> > KIP-890
> > > > > will be automatically downgraded too?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Hey Jun,
> > > > > >
> > > > > > I wanted to get back to you about your questions about MV/IBP.
> > > > > >
> > > > > > Looking at the options, I think it makes the most sense to
> create a
> > > > > > separate feature for transactions and use that to version gate
> the
> > > > > features
> > > > > > we need to version gate (flexible transactional state records and
> > > using
> > > > > the
> > > > > > new protocol)
> > > > > > I've updated the KIP to include this change. Hopefully that's
> > > > everything
> > > > > we
> > > > > > need for this KIP :)
> > > > > >
> > > > > > Justine
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan <
> > jols...@confluent.io
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Jun,
> > > > > > >
> > > > > > > I will update the KIP with the prev field for prepare as well.
> > > > > > >
> > > > > > > PREPARE
> > > > > > > producerId: x
> > > > > > > previous/lastProducerId (tagged field): x
> > > > > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > > > > producerEpoch: y + 1
> > > > > > >
> > > > > > > COMPLETE
> > > > > > > producerId: x or z if y overflowed
> > > > > > > previous/lastProducerId (tagged field): x
> > > > > > > nextProducerId (tagged field): empty
> > > > > > > producerEpoch: y + 1 or 0 if we overflowed
> > > > > > >
> > > > > > > Thanks again,
> > > > > > > Justine
> > > > > > >
> > > > > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao
>  > >
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hi, Justine,
> > > > > > >>
> > > > > > >> 101.3 Thanks for the explanation.
> > > > > > >> (1) My point was that the coordinator could fail right after
> > > writing
> > > > > the
> > > > > > >> prepare marker. When the new txn coordinator generates the
> > > complete
> > > > > > marker
> > > > > > >> after the failover, it needs some field from the prepare
> marker
> > to
> > > > > > >> determine whether it's written by the new client.
> > > > > > >>
> > > > > > >> (2) The changing of the behavior sounds good to me. We only
> want
> > > to
> > > > > > return
> > > > > > >> success if the prepare state is written by the new client. 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-02-05 Thread Jun Rao
Hi, Justine,

Which PRC/record protocols will TV guard? Going forward, will those
PRC/record protocols only be guarded by TV and not by other features like
MV?

Thanks,

Jun

On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan 
wrote:

> Hi Jun,
>
> Sorry I think I misunderstood your question or answered incorrectly. The TV
> version should ideally be fully independent from MV.
> At least for the changes I proposed, TV should not affect MV and MV should
> not affect TV/
>
> I think if we downgrade TV, only that feature should downgrade. Likewise
> the same with MV. The finalizedFeatures should just reflect the feature
> downgrade we made.
>
> I also plan to write a new KIP for managing the disk format and upgrade
> tool as we will need new flags to support these features. That should help
> clarify some things.
>
> Justine
>
> On Mon, Feb 5, 2024 at 11:03 AM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > So, if we downgrade TV, we could implicitly downgrade another feature
> (say
> > MV) that has dependency (e.g. RPC). What would we return for
> > FinalizedFeatures for MV in ApiVersionsResponse in that case?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan
>  > >
> > wrote:
> >
> > > Hey Jun,
> > >
> > > Yes, the idea is that if we downgrade TV (transaction version) we will
> > stop
> > > using the add partitions to txn optimization and stop writing the
> > flexible
> > > feature version of the log.
> > > In the compatibility section I included some explanations on how this
> is
> > > done.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the update.
> > > >
> > > > If we ever downgrade the transaction feature, any feature depending
> on
> > > > changes on top of those RPC/record
> > > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in
> KIP-890
> > > > will be automatically downgraded too?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > I wanted to get back to you about your questions about MV/IBP.
> > > > >
> > > > > Looking at the options, I think it makes the most sense to create a
> > > > > separate feature for transactions and use that to version gate the
> > > > features
> > > > > we need to version gate (flexible transactional state records and
> > using
> > > > the
> > > > > new protocol)
> > > > > I've updated the KIP to include this change. Hopefully that's
> > > everything
> > > > we
> > > > > need for this KIP :)
> > > > >
> > > > > Justine
> > > > >
> > > > >
> > > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan <
> jols...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Jun,
> > > > > >
> > > > > > I will update the KIP with the prev field for prepare as well.
> > > > > >
> > > > > > PREPARE
> > > > > > producerId: x
> > > > > > previous/lastProducerId (tagged field): x
> > > > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > > > producerEpoch: y + 1
> > > > > >
> > > > > > COMPLETE
> > > > > > producerId: x or z if y overflowed
> > > > > > previous/lastProducerId (tagged field): x
> > > > > > nextProducerId (tagged field): empty
> > > > > > producerEpoch: y + 1 or 0 if we overflowed
> > > > > >
> > > > > > Thanks again,
> > > > > > Justine
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao  >
> > > > > wrote:
> > > > > >
> > > > > >> Hi, Justine,
> > > > > >>
> > > > > >> 101.3 Thanks for the explanation.
> > > > > >> (1) My point was that the coordinator could fail right after
> > writing
> > > > the
> > > > > >> prepare marker. When the new txn coordinator generates the
> > complete
> > > > > marker
> > > > > >> after the failover, it needs some field from the prepare marker
> to
> > > > > >> determine whether it's written by the new client.
> > > > > >>
> > > > > >> (2) The changing of the behavior sounds good to me. We only want
> > to
> > > > > return
> > > > > >> success if the prepare state is written by the new client. So,
> in
> > > the
> > > > > >> non-overflow case, it seems that we also need sth in the prepare
> > > > marker
> > > > > to
> > > > > >> tell us whether it's written by the new client.
> > > > > >>
> > > > > >> 112. Thanks for the explanation. That sounds good to me.
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
> > > > > >>  wrote:
> > > > > >>
> > > > > >> > 101.3 I realized that I actually have two questions.
> > > > > >> > > (1) In the non-overflow case, we need to write the previous
> > > > produce
> > > > > Id
> > > > > >> > tagged field in the end maker so that we know if the marker is
> > > from
> > > > > the
> > > > > >> new
> > > > > >> > client. Since the end maker is derived from the prepare
> marker,
> > > > should
> > > > > >> we
> > > > > >> > write the previous 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #90

2024-02-05 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2621

2024-02-05 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-853: KRaft Controller Membership Changes

2024-02-05 Thread Jason Gustafson
Hey Jose,

A few more questions:

1. When adding a voter, the KIP proposes to return a timeout error if the
voter cannot catch up in time. It might be more useful to return a more
specific error so that an operator can understand why the timeout occurred.
Alternatively, perhaps we could keep the generic error but add an
ErrorMessage field in the response to explain the issue. (At the same time,
perhaps we can add ErrorMessage to the other new RPCs. I think this is
becoming more or less standard in Kafka RPCs.)
2. Say that the voters are A, B, and C, and that C is offline. What error
would be returned from RemoveVoters if we try to remove B before C is back
online?
3. There is a note about a voter accepting a Vote request from a voter that
is not in their voter set. I guess the same comment applies to
BeginQuorumEpoch? In other words, a voter should accept a new leader even
if it is not in its local voter set. The note in the reference says: "Thus,
servers process incoming RPC requests without consulting their current
configurations." On the other hand, when it comes to election, voters will
become candidates based on the latest voter set from the log (regardless
whether it is committed), and they will seek out votes only from the same
voter set. Is that about right?

Thanks,
Jason

On Fri, Feb 2, 2024 at 10:55 AM Jun Rao  wrote:

> Hi, Jose,
>
> Thanks for the KIP. A few comments below.
>
> 10. kraft.version: Functionality wise, this seems very similar to
> metadata.version, which is to make sure that all brokers/controllers are on
> a supported version before enabling a new feature. Could you explain why we
> need a new one instead of just relying on metadata.version?
>
> 11. Both the quorum-state file and controller.quorum.bootstrap.servers
> contain endpoints. Which one takes precedence?
>
> 12. It seems that downgrading from this KIP is not supported? Could we have
> a section to make it explicit?
>
> 13. controller.quorum.auto.join.enable: If this is set true, when does the
> controller issue the addVoter RPC? Does it need to wait until it's caught
> up? Does it issue the addVoter RPC on every restart?
>
> 14. "using the AddVoter RPC, the Admin client or the kafka-metadata-quorum
> CLI.": In general, the operator doesn't know how to make RPC calls. So the
> practical options are either CLI or adminClient.
>
> 15. VotersRecord: Why does it need to include name and SecurityProtocol in
> EndPoints? It's meant to replace controller.quorum.voters, which only
> includes host/port.
>
> 16. "The KRaft leader cannot do this for observers (brokers) since their
> supported versions are not directly known by the KRaft leader."
> Hmm, the KRaft leader receives BrokerRegistrationRequest that includes
> supported feature versions, right?
>
> 17. UpdateVoter:
> 17.1 "The leader will learn the range of supported version from the
> UpdateVoter RPC".
> KIP-919 introduced ControllerRegistrationRequest to do that. Do we need a
> new one?
> 17.2 Do we support changing the voter's endpoint dynamically? If not, it
> seems that can be part of ControllerRegistrationRequest too.
>
> 18. AddVoter
> 18.1 "This RPC can be sent to a broker or controller, when sent to a
> broker, the broker will forward the request to the active controller."
> If it's sent to a non-active controller, it will also be forwarded to the
> active controller, right?
> 18.2 Why do we need the name/security protocol fields in the request?
> Currently, they can be derived from the configs.
> { "name": "Listeners", "type": "[]Listener", "versions": "0+",
>   "about": "The endpoints that can be used to communicate with the
> voter", "fields": [
>   { "name": "Name", "type": "string", "versions": "0+", "mapKey": true,
> "about": "The name of the endpoint" },
>   { "name": "Host", "type": "string", "versions": "0+",
> "about": "The hostname" },
>   { "name": "Port", "type": "uint16", "versions": "0+",
> "about": "The port" },
>   { "name": "SecurityProtocol", "type": "int16", "versions": "0+",
> "about": "The security protocol" }
> ]}
> 18.3 "4. Send an ApiVersions RPC to the first listener to discover the
> supported kraft.version of the new voter."
> Hmm, I thought that we found using ApiVersions unreliable (
> https://issues.apache.org/jira/browse/KAFKA-15230) and therefore
> introduced ControllerRegistrationRequest to propagate this information.
> ControllerRegistrationRequest can be made at step 1 during catchup.
> 18.4 "In 4., the new replica will be part of the quorum so the leader will
> start sending BeginQuorumEpoch requests to this replica."
> Hmm, the leader should have sent BeginQuorumEpoch at step 1 so that the new
> replica can catch up from it, right? Aslo, step 4 above only mentions
> ApiVersions RPC, not BeginQuorumEpoch.
>
> 19. Vote: It's kind of weird that VoterUuid is at the partition level.
> VoteId and VoterUuid uniquely identify a node, right? So it seems that it
> should be at 

Re: [DISCUSS] KIP-890 Server Side Defense

2024-02-05 Thread Justine Olshan
Hi Jun,

Sorry I think I misunderstood your question or answered incorrectly. The TV
version should ideally be fully independent from MV.
At least for the changes I proposed, TV should not affect MV and MV should
not affect TV/

I think if we downgrade TV, only that feature should downgrade. Likewise
the same with MV. The finalizedFeatures should just reflect the feature
downgrade we made.

I also plan to write a new KIP for managing the disk format and upgrade
tool as we will need new flags to support these features. That should help
clarify some things.

Justine

On Mon, Feb 5, 2024 at 11:03 AM Jun Rao  wrote:

> Hi, Justine,
>
> Thanks for the reply.
>
> So, if we downgrade TV, we could implicitly downgrade another feature (say
> MV) that has dependency (e.g. RPC). What would we return for
> FinalizedFeatures for MV in ApiVersionsResponse in that case?
>
> Thanks,
>
> Jun
>
> On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan  >
> wrote:
>
> > Hey Jun,
> >
> > Yes, the idea is that if we downgrade TV (transaction version) we will
> stop
> > using the add partitions to txn optimization and stop writing the
> flexible
> > feature version of the log.
> > In the compatibility section I included some explanations on how this is
> > done.
> >
> > Thanks,
> > Justine
> >
> > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao 
> wrote:
> >
> > > Hi, Justine,
> > >
> > > Thanks for the update.
> > >
> > > If we ever downgrade the transaction feature, any feature depending on
> > > changes on top of those RPC/record
> > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in KIP-890
> > > will be automatically downgraded too?
> > >
> > > Jun
> > >
> > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > > 
> > > wrote:
> > >
> > > > Hey Jun,
> > > >
> > > > I wanted to get back to you about your questions about MV/IBP.
> > > >
> > > > Looking at the options, I think it makes the most sense to create a
> > > > separate feature for transactions and use that to version gate the
> > > features
> > > > we need to version gate (flexible transactional state records and
> using
> > > the
> > > > new protocol)
> > > > I've updated the KIP to include this change. Hopefully that's
> > everything
> > > we
> > > > need for this KIP :)
> > > >
> > > > Justine
> > > >
> > > >
> > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan  >
> > > > wrote:
> > > >
> > > > > Thanks Jun,
> > > > >
> > > > > I will update the KIP with the prev field for prepare as well.
> > > > >
> > > > > PREPARE
> > > > > producerId: x
> > > > > previous/lastProducerId (tagged field): x
> > > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > > producerEpoch: y + 1
> > > > >
> > > > > COMPLETE
> > > > > producerId: x or z if y overflowed
> > > > > previous/lastProducerId (tagged field): x
> > > > > nextProducerId (tagged field): empty
> > > > > producerEpoch: y + 1 or 0 if we overflowed
> > > > >
> > > > > Thanks again,
> > > > > Justine
> > > > >
> > > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao 
> > > > wrote:
> > > > >
> > > > >> Hi, Justine,
> > > > >>
> > > > >> 101.3 Thanks for the explanation.
> > > > >> (1) My point was that the coordinator could fail right after
> writing
> > > the
> > > > >> prepare marker. When the new txn coordinator generates the
> complete
> > > > marker
> > > > >> after the failover, it needs some field from the prepare marker to
> > > > >> determine whether it's written by the new client.
> > > > >>
> > > > >> (2) The changing of the behavior sounds good to me. We only want
> to
> > > > return
> > > > >> success if the prepare state is written by the new client. So, in
> > the
> > > > >> non-overflow case, it seems that we also need sth in the prepare
> > > marker
> > > > to
> > > > >> tell us whether it's written by the new client.
> > > > >>
> > > > >> 112. Thanks for the explanation. That sounds good to me.
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
> > > > >>  wrote:
> > > > >>
> > > > >> > 101.3 I realized that I actually have two questions.
> > > > >> > > (1) In the non-overflow case, we need to write the previous
> > > produce
> > > > Id
> > > > >> > tagged field in the end maker so that we know if the marker is
> > from
> > > > the
> > > > >> new
> > > > >> > client. Since the end maker is derived from the prepare marker,
> > > should
> > > > >> we
> > > > >> > write the previous produce Id in the prepare marker field too?
> > > > >> Otherwise,
> > > > >> > we will lose this information when deriving the end marker.
> > > > >> >
> > > > >> > The "previous" producer ID is in the normal producer ID field.
> So
> > > yes,
> > > > >> we
> > > > >> > need it in prepare and that was always the plan.
> > > > >> >
> > > > >> > Maybe it is a bit unclear so I will enumerate the fields and add
> > > them
> > > > to
> > > > >> > the KIP if that helps.
> > > > >> > Say we have producer ID x and epoch y. When we overflow epoch y
> we
> > > get
> > > > >> > 

Re: [DISCUSS] KIP-939: Support Participation in 2PC

2024-02-05 Thread Jun Rao
Hi, Artem,

Thanks for the reply.

20. For Flink usage, it seems that the APIs used to abort and commit a
prepared txn are not symmetric.
To abort, the app will just call
  producer.initTransactions(false)

To commit, the app needs to call
  producer.initTransactions(true)
  producer.completeTransaction(preparedTxnState)

Will this be a concern? For the dual-writer usage, both abort/commit use
the same API.

21. transaction.max.timeout.ms could in theory be MAX_INT. Perhaps we could
use a negative timeout in the record to indicate 2PC?

30. The KIP has two different APIs to abort an ongoing txn. Do we need both?
  producer.initTransactions(false)
  adminClient.forceTerminateTransaction(transactionalId)

31. "This would flush all the pending messages and transition the producer
into a mode where only .commitTransaction, .abortTransaction, or
.completeTransaction could be called.  If the call is successful (all
messages successfully got flushed to all partitions) the transaction is
prepared."
 If the producer calls send() in that state, what exception will the caller
receive?

Jun


On Fri, Feb 2, 2024 at 3:34 PM Artem Livshits
 wrote:

> Hi Jun,
>
> >  Then, should we change the following in the example to use
> InitProducerId(true) instead?
>
> We could. I just thought that it's good to make the example self-contained
> by starting from a clean state.
>
> > Also, could Flink just follow the dual-write recipe?
>
> I think it would bring some unnecessary logic to Flink (or any other system
> that already has a transaction coordinator and just wants to drive Kafka to
> the desired state).  We could discuss it with Flink folks, the current
> proposal was developed in collaboration with them.
>
> > 21. Could a non 2pc user explicitly set the TransactionTimeoutMs to
> Integer.MAX_VALUE?
>
> The server would reject this for regular transactions, it only accepts
> values that are <= *transaction.max.timeout.ms
>  *(a broker config).
>
> > 24. Hmm, In KIP-890, without 2pc, the coordinator expects the endTxn
> request to use the ongoing pid. ...
>
> Without 2PC there is no case where the pid could change between starting a
> transaction and endTxn (InitProducerId would abort any ongoing
> transaction).  WIth 2PC there is now a case where there could be
> InitProducerId that can change the pid without aborting the transaction, so
> we need to handle that.  I wouldn't say that the flow is different, but
> it's rather extended to handle new cases.  The main principle is still the
> same -- for all operations we use the latest "operational" pid and epoch
> known to the client, this way we guarantee that we can fence zombie / split
> brain clients by disrupting the "latest known" pid + epoch progression.
>
> > 25. "We send out markers using the original ongoing transaction
> ProducerId and ProducerEpoch" ...
>
> Updated.
>
> -Artem
>
> On Mon, Jan 29, 2024 at 4:57 PM Jun Rao  wrote:
>
> > Hi, Artem,
> >
> > Thanks for the reply.
> >
> > 20. So for the dual-write recipe, we should always call
> > InitProducerId(keepPreparedTxn=true) from the producer? Then, should we
> > change the following in the example to use InitProducerId(true) instead?
> > 1. InitProducerId(false); TC STATE: Empty, ProducerId=42,
> > ProducerEpoch=MAX-1, PrevProducerId=-1, NextProducerId=-1,
> > NextProducerEpoch=-1; RESPONSE ProducerId=42, Epoch=MAX-1,
> > OngoingTxnProducerId=-1, OngoingTxnEpoch=-1.
> > Also, could Flink just follow the dual-write recipe? It's simpler if
> there
> > is one way to solve the 2pc issue.
> >
> > 21. Could a non 2pc user explicitly set the TransactionTimeoutMs to
> > Integer.MAX_VALUE?
> >
> > 24. Hmm, In KIP-890, without 2pc, the coordinator expects the endTxn
> > request to use the ongoing pid. With 2pc, the coordinator now expects the
> > endTxn request to use the next pid. So, the flow is different, right?
> >
> > 25. "We send out markers using the original ongoing transaction
> ProducerId
> > and ProducerEpoch"
> > We should use ProducerEpoch + 1 in the marker, right?
> >
> > Jun
> >
> > On Fri, Jan 26, 2024 at 8:35 PM Artem Livshits
> >  wrote:
> >
> > > Hi Jun,
> > >
> > > > 20.  I am a bit confused by how we set keepPreparedTxn.  ...
> > >
> > > keepPreparedTxn=true informs the transaction coordinator that it should
> > > keep the ongoing transaction, if any.  If the keepPreparedTxn=false,
> then
> > > any ongoing transaction is aborted (this is exactly the current
> > behavior).
> > > enable2Pc is a separate argument that is controlled by the
> > > *transaction.two.phase.commit.enable *setting on the client.
> > >
> > > To start 2PC, the client just needs to set
> > > *transaction.two.phase.commit.enable*=true in the config.  Then if the
> > > client knows the status of the transaction upfront (in the case of
> Flink,
> > > Flink keeps the knowledge if the transaction is prepared in its own
> > store,
> > > so it always knows upfront), it can set keepPreparedTxn 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2620

2024-02-05 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-890 Server Side Defense

2024-02-05 Thread Jun Rao
Hi, Justine,

Thanks for the reply.

So, if we downgrade TV, we could implicitly downgrade another feature (say
MV) that has dependency (e.g. RPC). What would we return for
FinalizedFeatures for MV in ApiVersionsResponse in that case?

Thanks,

Jun

On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan 
wrote:

> Hey Jun,
>
> Yes, the idea is that if we downgrade TV (transaction version) we will stop
> using the add partitions to txn optimization and stop writing the flexible
> feature version of the log.
> In the compatibility section I included some explanations on how this is
> done.
>
> Thanks,
> Justine
>
> On Fri, Feb 2, 2024 at 11:12 AM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the update.
> >
> > If we ever downgrade the transaction feature, any feature depending on
> > changes on top of those RPC/record
> > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in KIP-890
> > will be automatically downgraded too?
> >
> > Jun
> >
> > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hey Jun,
> > >
> > > I wanted to get back to you about your questions about MV/IBP.
> > >
> > > Looking at the options, I think it makes the most sense to create a
> > > separate feature for transactions and use that to version gate the
> > features
> > > we need to version gate (flexible transactional state records and using
> > the
> > > new protocol)
> > > I've updated the KIP to include this change. Hopefully that's
> everything
> > we
> > > need for this KIP :)
> > >
> > > Justine
> > >
> > >
> > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan 
> > > wrote:
> > >
> > > > Thanks Jun,
> > > >
> > > > I will update the KIP with the prev field for prepare as well.
> > > >
> > > > PREPARE
> > > > producerId: x
> > > > previous/lastProducerId (tagged field): x
> > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > producerEpoch: y + 1
> > > >
> > > > COMPLETE
> > > > producerId: x or z if y overflowed
> > > > previous/lastProducerId (tagged field): x
> > > > nextProducerId (tagged field): empty
> > > > producerEpoch: y + 1 or 0 if we overflowed
> > > >
> > > > Thanks again,
> > > > Justine
> > > >
> > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao 
> > > wrote:
> > > >
> > > >> Hi, Justine,
> > > >>
> > > >> 101.3 Thanks for the explanation.
> > > >> (1) My point was that the coordinator could fail right after writing
> > the
> > > >> prepare marker. When the new txn coordinator generates the complete
> > > marker
> > > >> after the failover, it needs some field from the prepare marker to
> > > >> determine whether it's written by the new client.
> > > >>
> > > >> (2) The changing of the behavior sounds good to me. We only want to
> > > return
> > > >> success if the prepare state is written by the new client. So, in
> the
> > > >> non-overflow case, it seems that we also need sth in the prepare
> > marker
> > > to
> > > >> tell us whether it's written by the new client.
> > > >>
> > > >> 112. Thanks for the explanation. That sounds good to me.
> > > >>
> > > >> Jun
> > > >>
> > > >> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
> > > >>  wrote:
> > > >>
> > > >> > 101.3 I realized that I actually have two questions.
> > > >> > > (1) In the non-overflow case, we need to write the previous
> > produce
> > > Id
> > > >> > tagged field in the end maker so that we know if the marker is
> from
> > > the
> > > >> new
> > > >> > client. Since the end maker is derived from the prepare marker,
> > should
> > > >> we
> > > >> > write the previous produce Id in the prepare marker field too?
> > > >> Otherwise,
> > > >> > we will lose this information when deriving the end marker.
> > > >> >
> > > >> > The "previous" producer ID is in the normal producer ID field. So
> > yes,
> > > >> we
> > > >> > need it in prepare and that was always the plan.
> > > >> >
> > > >> > Maybe it is a bit unclear so I will enumerate the fields and add
> > them
> > > to
> > > >> > the KIP if that helps.
> > > >> > Say we have producer ID x and epoch y. When we overflow epoch y we
> > get
> > > >> > producer ID Z.
> > > >> >
> > > >> > PREPARE
> > > >> > producerId: x
> > > >> > previous/lastProducerId (tagged field): empty
> > > >> > nextProducerId (tagged field): empty or z if y will overflow
> > > >> > producerEpoch: y + 1
> > > >> >
> > > >> > COMPLETE
> > > >> > producerId: x or z if y overflowed
> > > >> > previous/lastProducerId (tagged field): x
> > > >> > nextProducerId (tagged field): empty
> > > >> > producerEpoch: y + 1 or 0 if we overflowed
> > > >> >
> > > >> > (2) In the prepare phase, if we retry and see epoch - 1 + ID in
> last
> > > >> seen
> > > >> > fields and are issuing the same command (ie commit not abort), we
> > > return
> > > >> > success. The logic before KIP-890 seems to return
> > > >> CONCURRENT_TRANSACTIONS
> > > >> > in this case. Are we intentionally making this change?
> > > >> >
> > > >> > Hmm -- we would fence the producer if the epoch is bumped and we
> > get a

[jira] [Created] (KAFKA-16225) Flaky test suite LogDirFailureTest

2024-02-05 Thread Greg Harris (Jira)
Greg Harris created KAFKA-16225:
---

 Summary: Flaky test suite LogDirFailureTest
 Key: KAFKA-16225
 URL: https://issues.apache.org/jira/browse/KAFKA-16225
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Reporter: Greg Harris


I see this failure on trunk and in PR builds for multiple methods in this test 
suite:
{noformat}
org.opentest4j.AssertionFailedError: expected:  but was:     
at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)    
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)    
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)    
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:179)    
at kafka.utils.TestUtils$.causeLogDirFailure(TestUtils.scala:1715)    
at 
kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:186)
    
at 
kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:70){noformat}
It appears this assertion is failing
[https://github.com/apache/kafka/blob/f54975c33135140351c50370282e86c49c81bbdd/core/src/test/scala/unit/kafka/utils/TestUtils.scala#L1715]

The other error which is appearing is this:
{noformat}
org.opentest4j.AssertionFailedError: Unexpected exception type thrown, 
expected:  but was: 
    
at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    
at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:67)    
at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)    
at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)    
at 
kafka.server.LogDirFailureTest.testProduceErrorsFromLogDirFailureOnLeader(LogDirFailureTest.scala:164)
    
at 
kafka.server.LogDirFailureTest.testProduceErrorFromFailureOnLogRoll(LogDirFailureTest.scala:64){noformat}
Failures appear to have started in this commit, but this does not indicate that 
this commit is at fault: 
[https://github.com/apache/kafka/tree/3d95a69a28c2d16e96618cfa9a1eb69180fb66ea] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16221) IllegalStateException from Producer

2024-02-05 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-16221.
-
Resolution: Fixed

> IllegalStateException from Producer
> ---
>
> Key: KAFKA-16221
> URL: https://issues.apache.org/jira/browse/KAFKA-16221
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.6.0
>Reporter: Matthias J. Sax
>Priority: Critical
> Fix For: 3.7.0
>
>
> https://issues.apache.org/jira/browse/KAFKA-14831 fixed a producer bug about 
> internal TX state transition and the producer is now throwing an 
> IllegalStateException in situations it did swallow an internal error before.
> This change surfaces a bug in Kafka Streams: Kafka Streams calls 
> `abortTransaction()` blindly when a task is closed dirty, even if the 
> Producer is already in an internal fatal state. However, if the Producer is 
> in a fatal state, Kafka Streams should skip `abortTransaction` and only 
> `close()` the Producer when closing a task dirty.
> The bug is surfaced after `commitTransaction()` did timeout or after an 
> `InvalidProducerEpochException` from a `send()` call, leading to the call to 
> `abortTransaction()` – Kafka Streams does not track right now if a commit-TX 
> is in progress.
> {code:java}
> java.lang.IllegalStateException: Cannot attempt operation `abortTransaction` 
> because the previous call to `commitTransaction` timed out and must be retried
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.handleCachedTransactionRequestResult(TransactionManager.java:1203)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.beginAbort(TransactionManager.java:326)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:274) {code}
> and
> {code:java}
> [2024-01-16 04:19:32,584] ERROR [kafka-producer-network-thread | 
> i-01aea6907970b1bf6-StreamThread-1-producer] stream-thread 
> [i-01aea6907970b1bf6-StreamThread-1] stream-task [1_2] Error encountered 
> sending r   ecord to topic joined-counts for task 1_2 due to:
> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer 
> attempted to produce with an old epoch.
> Written offsets would not be recorded and no more records would be sent since 
> the producer is fenced, indicating the task may be migrated out 
> (org.apache.kafka.streams.processor.internals.RecordCollectorImp   l)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer 
> attempted to produce with an old epoch.
> // followed by
> [2024-01-16 04:19:32,587] ERROR [kafka-producer-network-thread | 
> i-01aea6907970b1bf6-StreamThread-1-producer] [Producer 
> clientId=i-01aea6907970b1bf6-StreamThread-1-producer, 
> transactionalId=stream-soak-test   
> -bbb995dc-1ba2-41ed-8791-0512ab4b904d-1] Aborting producer batches due to 
> fatal error (org.apache.kafka.clients.producer.internals.Sender)
> java.lang.IllegalStateException: TransactionalId 
> stream-soak-test-bbb995dc-1ba2-41ed-8791-0512ab4b904d-1: Invalid transition 
> attempted from state FATAL_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:996)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:451)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.maybeTransitionToErrorState(TransactionManager.java:664)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.handleFailedBatch(TransactionManager.java:669)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:835)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:819)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:771)
> at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:702)
> at 
> org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:627)
> at java.util.ArrayList.forEach(ArrayList.java:1259)
> at 
> org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:612)
> at java.lang.Iterable.forEach(Iterable.java:75)
> at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:612)
> at 
> org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$8(Sender.java:917)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:608)
> at 

[jira] [Created] (KAFKA-16224) Fix handling of deleted topic when auto-committing before revocation

2024-02-05 Thread Lianet Magrans (Jira)
Lianet Magrans created KAFKA-16224:
--

 Summary: Fix handling of deleted topic when auto-committing before 
revocation
 Key: KAFKA-16224
 URL: https://issues.apache.org/jira/browse/KAFKA-16224
 Project: Kafka
  Issue Type: Sub-task
  Components: clients, consumer
Reporter: Lianet Magrans
Assignee: Lianet Magrans


Current logic for auto-committing offsets when partitions are revoked will 
retry continuously when getting UNKNOWN_TOPIC_OR_PARTITION, leading to the 
member not completing the revocation in time. We should consider this as an 
indication of the topic being deleted, and in the context of committing offsets 
to revoke partitions, we should abort the commit attempt and move on to 
complete and ack the revocation.  
While reviewing this, review the behaviour around this error for other commit 
operations as well in case a similar reasoning should be applied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #89

2024-02-05 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-971 Expose replication-offset-lag MirrorMaker2 metric

2024-02-05 Thread Elxan Eminov
Hi Mickael!
Any further thoughts on this?

Thanks,
Elkhan

On Thu, 18 Jan 2024 at 11:53, Mickael Maison 
wrote:

> Hi Elxan,
>
> Thanks for the updates.
>
> We used dots to separate words in configuration names, so I think
> replication.offset.lag.metric.last-replicated-offset.ttl should be
> named replication.offset.lag.metric.last.replicated.offset.ttl
> instead.
>
> About the names of the metrics, fair enough if you prefer keeping the
> replication prefix. Out of the alternatives you mentioned, I think I
> prefer replication-record-lag. I think the metrics and configuration
> names should match too. Let's see what the others think about it.
>
> Thanks,
> Mickael
>
> On Mon, Jan 15, 2024 at 9:50 PM Elxan Eminov 
> wrote:
> >
> > Apologies, forgot to reply on your last comment about the metric name.
> > I believe both replication-lag and record-lag are a little too abstract -
> > what do you think about either leaving it as replication-offset-lag or
> > renaming to replication-record-lag?
> >
> > Thanks
> >
> > On Wed, 10 Jan 2024 at 15:31, Mickael Maison 
> > wrote:
> >
> > > Hi Elxan,
> > >
> > > Thanks for the KIP, it looks like a useful addition.
> > >
> > > Can you add to the KIP the default value you propose for
> > > replication.lag.metric.refresh.interval? In MirrorMaker most interval
> > > configs can be set to -1 to disable them, will it be the case for this
> > > new feature or will this setting only accept positive values?
> > > I also wonder if replication-lag, or record-lag would be clearer names
> > > instead of replication-offset-lag, WDYT?
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Wed, Jan 3, 2024 at 6:15 PM Elxan Eminov 
> > > wrote:
> > > >
> > > > Hi all,
> > > > Here is the vote thread:
> > > > https://lists.apache.org/thread/ftlnolcrh858dry89sjg06mdcdj9mrqv
> > > >
> > > > Cheers!
> > > >
> > > > On Wed, 27 Dec 2023 at 11:23, Elxan Eminov 
> > > wrote:
> > > >
> > > > > Hi all,
> > > > > I've updated the KIP with the details we discussed in this thread.
> > > > > I'll call in a vote after the holidays if everything looks good.
> > > > > Thanks!
> > > > >
> > > > > On Sat, 26 Aug 2023 at 15:49, Elxan Eminov <
> elxanemino...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Relatively minor change with a new metric for MM2
> > > > >>
> > > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-971%3A+Expose+replication-offset-lag+MirrorMaker2+metric
> > > > >>
> > > > >
> > >
>


[jira] [Resolved] (KAFKA-15717) KRaft support in LeaderEpochIntegrationTest

2024-02-05 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-15717.

Fix Version/s: 3.8.0
   Resolution: Fixed

> KRaft support in LeaderEpochIntegrationTest
> ---
>
> Key: KAFKA-15717
> URL: https://issues.apache.org/jira/browse/KAFKA-15717
> Project: Kafka
>  Issue Type: Task
>  Components: core
>Reporter: Sameer Tejani
>Priority: Minor
>  Labels: kraft, kraft-test, newbie
> Fix For: 3.8.0
>
>
> The following tests in LeaderEpochIntegrationTest in 
> core/src/test/scala/unit/kafka/server/epoch/LeaderEpochIntegrationTest.scala 
> need to be updated to support KRaft
> 67 : def shouldAddCurrentLeaderEpochToMessagesAsTheyAreWrittenToLeader(): 
> Unit = {
> 99 : def shouldSendLeaderEpochRequestAndGetAResponse(): Unit = {
> 144 : def shouldIncreaseLeaderEpochBetweenLeaderRestarts(): Unit = {
> Scanned 305 lines. Found 0 KRaft tests out of 3 tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16223) Replace EasyMock and PowerMock with Mockito for KafkaConfigBackingStoreTest

2024-02-05 Thread Hector Geraldino (Jira)
Hector Geraldino created KAFKA-16223:


 Summary: Replace EasyMock and PowerMock with Mockito for 
KafkaConfigBackingStoreTest
 Key: KAFKA-16223
 URL: https://issues.apache.org/jira/browse/KAFKA-16223
 Project: Kafka
  Issue Type: Sub-task
  Components: connect
Reporter: Hector Geraldino






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16222) Incorrect default user-principal quota after migration

2024-02-05 Thread Dominik (Jira)
Dominik created KAFKA-16222:
---

 Summary: Incorrect default user-principal quota after migration
 Key: KAFKA-16222
 URL: https://issues.apache.org/jira/browse/KAFKA-16222
 Project: Kafka
  Issue Type: Bug
  Components: kraft, migration
Affects Versions: 3.6.1
Reporter: Dominik


We observed that our default user quota seems not to be migrated correctly.


Before Migration:

bin/kafka-configs.sh --describe --all --entity-type users

Quota configs for the *default user-principal* are 
consumer_byte_rate=100.0, producer_byte_rate=100.0
Quota configs for user-principal 'myuser' are consumer_byte_rate=1.5E8, 
producer_byte_rate=1.5E8


After Migration:

bin/kafka-configs.sh --describe --all --entity-type users

Quota configs for *user-principal ''* are consumer_byte_rate=100.0, 
producer_byte_rate=100.0
Quota configs for user-principal 'myuser' are consumer_byte_rate=1.5E8, 
producer_byte_rate=1.5E8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14384) Flaky Test SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff

2024-02-05 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-14384.

Resolution: Fixed

> Flaky Test 
> SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff
> --
>
> Key: KAFKA-14384
> URL: https://issues.apache.org/jira/browse/KAFKA-14384
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Critical
>  Labels: flaky-test
>
> h3. Stacktrace
> java.lang.AssertionError: Did not receive all 5 records from topic 
> selfjoin-outputSelfJoinUpgradeIntegrationTestshouldUpgradeWithTopologyOptimizationOff
>  within 6 ms Expected: is a value equal to or greater than <5> but: <0> 
> was less than <5> at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.lambda$waitUntilMinKeyValueWithTimestampRecordsReceived$2(IntegrationTestUtils.java:763)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:382)
>  at 
> org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:350)
>  at 
> org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueWithTimestampRecordsReceived(IntegrationTestUtils.java:759)
>  at 
> org.apache.kafka.streams.integration.SelfJoinUpgradeIntegrationTest.processKeyValueAndVerifyCount(SelfJoinUpgradeIntegrationTest.java:244)
>  at 
> org.apache.kafka.streams.integration.SelfJoinUpgradeIntegrationTest.shouldUpgradeWithTopologyOptimizationOff(SelfJoinUpgradeIntegrationTest.java:155)
>  
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12835/4/testReport/org.apache.kafka.streams.integration/SelfJoinUpgradeIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldUpgradeWithTopologyOptimizationOff/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14385) Flaky Test QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable

2024-02-05 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-14385.

Resolution: Fixed

> Flaky Test 
> QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable
> ---
>
> Key: KAFKA-14385
> URL: https://issues.apache.org/jira/browse/KAFKA-14385
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Critical
>  Labels: flaky-test
>
> Failed twice on the same build (Java 8 & 11)
> h3. Stacktrace
> java.lang.AssertionError: KafkaStreams did not transit to RUNNING state 
> within 15000 milli seconds. Expected:  but: was  at 
> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at 
> org.apache.kafka.test.StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(StreamsTestUtils.java:134)
>  at 
> org.apache.kafka.test.StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(StreamsTestUtils.java:121)
>  at 
> org.apache.kafka.streams.integration.QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable(QueryableStateIntegrationTest.java:1038)
>  
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-12836/3/testReport/org.apache.kafka.streams.integration/QueryableStateIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldNotMakeStoreAvailableUntilAllStoresAvailable/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-8691) Flakey test ProcessorContextTest#shouldNotAllowToScheduleZeroMillisecondPunctuation

2024-02-05 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-8691.
---
Resolution: Fixed

> Flakey test  
> ProcessorContextTest#shouldNotAllowToScheduleZeroMillisecondPunctuation
> 
>
> Key: KAFKA-8691
> URL: https://issues.apache.org/jira/browse/KAFKA-8691
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Boyang Chen
>Priority: Critical
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6384/consoleFull]
> org.apache.kafka.streams.processor.internals.ProcessorContextTest > 
> shouldNotAllowToScheduleZeroMillisecondPunctuation PASSED*23:37:09* ERROR: 
> Failed to write output for test null.Gradle Test Executor 5*23:37:09* 
> java.lang.NullPointerException: Cannot invoke method write() on null 
> object*23:37:09*at 
> org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)*23:37:09*
> at 
> org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)*23:37:09*
>  at 
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*23:37:09*
>   at 
> org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34)*23:37:09*
>at 
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*23:37:09*
>   at java_io_FileOutputStream$write.call(Unknown Source)*23:37:09*
> at 
> build_5nv3fyjgqff9aim9wbxfnad9z$_run_closure5$_closure75$_closure108.doCall(/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/build.gradle:244)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-9897) Flaky Test StoreQueryIntegrationTest#shouldQuerySpecificActivePartitionStores

2024-02-05 Thread Lucas Brutschy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas Brutschy resolved KAFKA-9897.
---
Resolution: Fixed

> Flaky Test StoreQueryIntegrationTest#shouldQuerySpecificActivePartitionStores
> -
>
> Key: KAFKA-9897
> URL: https://issues.apache.org/jira/browse/KAFKA-9897
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Affects Versions: 2.6.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk14-scala2.13/22/testReport/junit/org.apache.kafka.streams.integration/StoreQueryIntegrationTest/shouldQuerySpecificActivePartitionStores/]
> {quote}org.apache.kafka.streams.errors.InvalidStateStoreException: Cannot get 
> state store source-table because the stream thread is PARTITIONS_ASSIGNED, 
> not RUNNING at 
> org.apache.kafka.streams.state.internals.StreamThreadStateStoreProvider.stores(StreamThreadStateStoreProvider.java:85)
>  at 
> org.apache.kafka.streams.state.internals.QueryableStoreProvider.getStore(QueryableStoreProvider.java:61)
>  at org.apache.kafka.streams.KafkaStreams.store(KafkaStreams.java:1183) at 
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQuerySpecificActivePartitionStores(StoreQueryIntegrationTest.java:178){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2619

2024-02-05 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.7 #88

2024-02-05 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-15460) Add group type filter to ListGroups API

2024-02-05 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-15460.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Add group type filter to ListGroups API
> ---
>
> Key: KAFKA-15460
> URL: https://issues.apache.org/jira/browse/KAFKA-15460
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: Ritika Reddy
>Priority: Major
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC2

2024-02-05 Thread Stanislav Kozlovski
Thanks Mickael, sounds good.

KAFKA-160195 and KAFKA-16157 were both merged!

I was made aware of one final blocker, this time for streams - KAFKA-16221.
Matthias was prompt with a short hotfix PR:
https://github.com/apache/kafka/pull/15315

After that goes into 3.7, I think I will be free to build the next RC.
Great work!

On Fri, Feb 2, 2024 at 6:43 PM Mickael Maison 
wrote:

> Hi Stanislav,
>
> I merged https://github.com/apache/kafka/pull/15308 in trunk. I let
> you cherry-pick it to 3.7.
>
> I think fixing the absolute show stoppers and calling JBOD support in
> KRaft early access in 3.7.0 is probably the right call. Even without
> the bugs we found, there's still quite a few JBOD follow up work to do
> (KAFKA-16061) + system tests and documentation updates.
>
> Thanks,
> Mickael
>
> On Fri, Feb 2, 2024 at 4:49 PM Stanislav Kozlovski
>  wrote:
> >
> > Thanks for the work everybody. Providing a status update at the end of
> the
> > week:
> >
> > - docs change explaining migration
> >  was merged
> > - the blocker KAFKA-16162 
> was
> > merged
> > - the blocker KAFKA-14616 
> was
> > merged
> > - a small blocker problem with the shadow jar plugin
> > 
> > - the blockers KAFKALESS-16157 & KAFKALESS-16195 aren't merged
> > - the good-to-have KAFKA-16082 isn't merged
> >
> > I think we should prioritize merging KAFKALESS-16195 and *call JBOD EA*.
> I
> > question whether we may find more blocker bugs in the next RC.
> > The release is late by approximately a month so far, so I do want to
> scope
> > down aggressively to meet the time-based goal.
> >
> > Best,
> > Stanislav
> >
> > On Mon, Jan 29, 2024 at 5:46 PM Omnia Ibrahim 
> > wrote:
> >
> > > Hi Stan and Gaurav,
> > > Just to clarify some points mentioned here before
> > >  KAFKA-14616: I raised a year ago so it's not related to JBOD work. It
> is
> > > rather a blocker bug for KRAFT in general. The PR from Colin should fix
> > > this. Am not sure if it is a blocker for 3.7 per-say as it was a major
> bug
> > > since 3.3 and got missed from all other releases.
> > >
> > > Regarding the JBOD's work:
> > > KAFKA-16082:  Is not a blocker for 3.7 instead it's nice fix. The pr
> > > https://github.com/apache/kafka/pull/15136 is quite a small one and
> was
> > > approved by Proven and I but it is waiting for a committer's approval.
> > > KAFKA-16162: This is a blocker for 3.7.  Same it’s a small pr
> > > https://github.com/apache/kafka/pull/15270 and it is approved Proven
> and
> > > I and the PR is waiting for committer's approval.
> > > KAFKA-16157: This is a blocker for 3.7. There is one small suggestion
> for
> > > the pr https://github.com/apache/kafka/pull/15263 but I don't think
> any
> > > of the current feedback is blocking the pr from getting approved.
> Assuming
> > > we get a committer's approval on it.
> > > KAFKA-16195:  Same it's a blocker but it has approval from Proven and I
> > > and we are waiting for committer's approval on the pr
> > > https://github.com/apache/kafka/pull/15262.
> > >
> > > If we can’t get a committer approval for KAFKA-16162, KAFKA-16157 and
> > > KAFKA-16195  in time for 3.7 then we can mark JBOD as early release
> > > assuming we merge at least KAFKA-16195.
> > >
> > > Regards,
> > > Omnia
> > >
> > > > On 26 Jan 2024, at 15:39, ka...@gnarula.com wrote:
> > > >
> > > > Apologies, I duplicated KAFKA-16157 twice in my previous message. I
> > > intended to mention KAFKA-16195
> > > > with the PR at https://github.com/apache/kafka/pull/15262 as the
> second
> > > JIRA.
> > > >
> > > > Thanks,
> > > > Gaurav
> > > >
> > > >> On 26 Jan 2024, at 15:34, ka...@gnarula.com wrote:
> > > >>
> > > >> Hi Stan,
> > > >>
> > > >> I wanted to share some updates about the bugs you shared earlier.
> > > >>
> > > >> - KAFKA-14616: I've reviewed and tested the PR from Colin and have
> > > observed
> > > >> the fix works as intended.
> > > >> - KAFKA-16162: I reviewed Proven's PR and found some gaps in the
> > > proposed fix. I've
> > > >> therefore raised https://github.com/apache/kafka/pull/15270
> following
> > > a discussion with Luke in JIRA.
> > > >> - KAFKA-16082: I don't think this is marked as a blocker anymore.
> I'm
> > > awaiting
> > > >> feedback/reviews at https://github.com/apache/kafka/pull/15136
> > > >>
> > > >> In addition to the above, there are 2 JIRAs I'd like to bring
> > > everyone's attention to:
> > > >>
> > > >> - KAFKA-16157: This is similar to KAFKA-14616 and is marked as a
> > > blocker. I've raised
> > > >> https://github.com/apache/kafka/pull/15263 and am awaiting reviews
> on
> > > it.
> > > >> - KAFKA-16157: I raised this yesterday and have addressed feedback
> from
> > > Luke. This should
> > > >> hopefully get merged soon.
> > > >>
> > > >> Regards,
> > > >> Gaurav
> > > >>
> > > >>
> > > >>> On 24 Jan 2024, at 11:51,