[jira] [Resolved] (KAFKA-9087) ReplicaAlterLogDirs stuck and restart fails with java.lang.IllegalStateException: Offset mismatch for the future replica

2023-01-06 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-9087.
---
Fix Version/s: 3.5.0
 Assignee: Chia-Ping Tsai
   Resolution: Fixed

> ReplicaAlterLogDirs stuck and restart fails with 
> java.lang.IllegalStateException: Offset mismatch for the future replica
> 
>
> Key: KAFKA-9087
> URL: https://issues.apache.org/jira/browse/KAFKA-9087
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0
>Reporter: Gregory Koshelev
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 3.5.0
>
>
> I've started multiple replica movements between log directories and some 
> partitions were stuck. After the restart of the broker I've got exception in 
> server.log:
> {noformat}
> [2019-06-11 17:58:46,304] ERROR [ReplicaAlterLogDirsThread-1]: Error due to 
> (kafka.server.ReplicaAlterLogDirsThread)
>  org.apache.kafka.common.KafkaException: Error processing data for partition 
> metrics_timers-35 offset 4224887
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:342)
>  at scala.Option.foreach(Option.scala:274)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:300)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:299)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$5(AbstractFetcherThread.scala:299)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
>  at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:299)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:132)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:131)
>  at scala.Option.foreach(Option.scala:274)
>  at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131)
>  at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
>  Caused by: java.lang.IllegalStateException: Offset mismatch for the future 
> replica metrics_timers-35: fetched offset = 4224887, log end offset = 0.
>  at 
> kafka.server.ReplicaAlterLogDirsThread.processPartitionData(ReplicaAlterLogDirsThread.scala:107)
>  at 
> kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:311)
>  ... 16 more
>  [2019-06-11 17:58:46,305] INFO [ReplicaAlterLogDirsThread-1]: Stopped 
> (kafka.server.ReplicaAlterLogDirsThread)
> {noformat}
> Also, ReplicaAlterLogDirsThread has been stopped. Further restarts do not fix 
> the problem. To fix it I've stopped the broker and remove all the stuck 
> future partitions.
> Detailed log below
> {noformat}
> [2019-06-11 12:09:52,833] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> [2019-06-11 12:21:34,979] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Loading producer state till offset 4224887 with 
> message format version 2 (kafka.log.Log)
> [2019-06-11 12:21:34,980] INFO [ProducerStateManager 
> partition=metrics_timers-35] Loading producer state from snapshot file 
> '/storage2/kafka/data/metrics_timers-35/04224887.snapshot' 
> (kafka.log.ProducerStateManager)
> [2019-06-11 12:21:34,980] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Completed load of log with 1 segments, log start 
> offset 4120720 and log end offset 4224887 in 70 ms (kafka.log.Log)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 0 (kafka.cluster.Replica)
> [2019-06-11 12:21:45,307] INFO Replica loaded for partition metrics_timers-35 
> with initial high watermark 4224887 (kafka.cluster.Replica)
> [2019-06-11 12:21:47,090] INFO [Log partition=metrics_timers-35, 
> dir=/storage2/kafka/data] Truncating to 4224887 has no effect as the largest 
> offset in the log is 4224886 (kafka.log.Log)
> [2019-06-11 12:30:04,757] INFO [ReplicaFetcher replicaId=1, 

[jira] [Created] (KAFKA-14602) offsetDelta in BatchMetadata is an int but the values are computed as difference of offsets which are longs.

2023-01-06 Thread Satish Duggana (Jira)
Satish Duggana created KAFKA-14602:
--

 Summary: offsetDelta in BatchMetadata is an int but the values are 
computed as difference of offsets which are longs.
 Key: KAFKA-14602
 URL: https://issues.apache.org/jira/browse/KAFKA-14602
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Satish Duggana


This is a followup of the discussion in 
https://github.com/apache/kafka/pull/13043#discussion_r1063071578

offsetDelta in BatchMetadata is an int. Becasue of which, ProducerAppendInfo 
may set a value that can overflow. Ideally, this data type should be long 
instead of int. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1490

2023-01-06 Thread Apache Jenkins Server
See 




Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Matthias J. Sax

Congrats!

On 1/6/23 5:15 PM, Luke Chen wrote:

Congratulations, Edoardo!

Luke

On Sat, Jan 7, 2023 at 7:58 AM Mickael Maison 
wrote:


Congratulations Edo!


On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:


Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer

Edoardo

Comar.

Edoardo has been a long time Kafka contributor since 2016. His major
contributions are the following.

KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
KIP-277: Fine Grained ACL for CreateTopics API
KIP-136: Add Listener name to SelectorMetrics tags

Congratulations, Edoardo!

Thanks,

Jun (on behalf of the Apache Kafka PMC)






Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Luke Chen
Congratulations, Edoardo!

Luke

On Sat, Jan 7, 2023 at 7:58 AM Mickael Maison 
wrote:

> Congratulations Edo!
>
>
> On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:
> >
> > Hi, Everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> Edoardo
> > Comar.
> >
> > Edoardo has been a long time Kafka contributor since 2016. His major
> > contributions are the following.
> >
> > KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
> > KIP-277: Fine Grained ACL for CreateTopics API
> > KIP-136: Add Listener name to SelectorMetrics tags
> >
> > Congratulations, Edoardo!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
>


Re: [ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Mickael Maison
Congratulations Edo!


On Sat, Jan 7, 2023 at 12:05 AM Jun Rao  wrote:
>
> Hi, Everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Edoardo
> Comar.
>
> Edoardo has been a long time Kafka contributor since 2016. His major
> contributions are the following.
>
> KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
> KIP-277: Fine Grained ACL for CreateTopics API
> KIP-136: Add Listener name to SelectorMetrics tags
>
> Congratulations, Edoardo!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)


Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-06 Thread Justine Olshan
Thanks Jason. Those changes make sense to me. I will update the KIP.



On Fri, Jan 6, 2023 at 3:31 PM Jason Gustafson 
wrote:

> Hey Justine,
>
> > I was wondering about compatibility here. When we send requests
> between brokers, we want to ensure that the receiving broker understands
> the request (specifically the new fields). Typically this is done via
> IBP/metadata version.
> I'm trying to think if there is a way around it but I'm not sure there is.
>
> Yes. I think we would gate usage of this behind an IBP bump. Does that seem
> reasonable?
>
> > As for the improvements -- can you clarify how the multiple transactional
> IDs would help here? Were you thinking of a case where we wait/batch
> multiple produce requests together? My understanding for now was 1
> transactional ID and one validation per 1 produce request.
>
> Each call to `AddPartitionsToTxn` is essentially a write to the transaction
> log and must block on replication. The more we can fit into a single
> request, the more writes we can do in parallel. The alternative is to make
> use of more connections, but usually we prefer batching since the network
> stack is not really optimized for high connection/request loads.
>
> > Finally with respect to the authorizations, I think it makes sense to
> skip
> topic authorizations, but I'm a bit confused by the "leader ID" field.
> Wouldn't we just want to flag the request as from a broker (does it matter
> which one?).
>
> We could also make it version-based. For the next version, we could require
> CLUSTER auth. So clients would not be able to use the API anymore, which is
> probably what we want.
>
> -Jason
>
> On Fri, Jan 6, 2023 at 10:43 AM Justine Olshan
> 
> wrote:
>
> > As a follow up, I was just thinking about the batching a bit more.
> > I suppose if we have one request in flight and we queue up the other
> > produce requests in some sort of purgatory, we could send information out
> > for all of them rather than one by one. So that would be a benefit of
> > batching partitions to add per transaction.
> >
> > I'll need to think a bit more on the design of this part of the KIP, and
> > will update the KIP in the next few days.
> >
> > Thanks,
> > Justine
> >
> > On Fri, Jan 6, 2023 at 10:22 AM Justine Olshan 
> > wrote:
> >
> > > Hey Jason -- thanks for the input -- I was just digging a bit deeper
> into
> > > the design + implementation of the validation calls here and what you
> say
> > > makes sense.
> > >
> > > I was wondering about compatibility here. When we send requests
> > > between brokers, we want to ensure that the receiving broker
> understands
> > > the request (specifically the new fields). Typically this is done via
> > > IBP/metadata version.
> > > I'm trying to think if there is a way around it but I'm not sure there
> > is.
> > >
> > > As for the improvements -- can you clarify how the multiple
> transactional
> > > IDs would help here? Were you thinking of a case where we wait/batch
> > > multiple produce requests together? My understanding for now was 1
> > > transactional ID and one validation per 1 produce request.
> > >
> > > Finally with respect to the authorizations, I think it makes sense to
> > skip
> > > topic authorizations, but I'm a bit confused by the "leader ID" field.
> > > Wouldn't we just want to flag the request as from a broker (does it
> > matter
> > > which one?).
> > >
> > > I think I want to adopt these suggestions, just had a few questions on
> > the
> > > details.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Thu, Jan 5, 2023 at 5:05 PM Jason Gustafson
> > 
> > > wrote:
> > >
> > >> Hi Justine,
> > >>
> > >> Thanks for the proposal.
> > >>
> > >> I was thinking about the implementation a little bit. In the current
> > >> proposal, the behavior depends on whether we have an old or new
> client.
> > >> For
> > >> old clients, we send `DescribeTransactions` and verify the result and
> > for
> > >> new clients, we send `AddPartitionsToTxn`. We might be able to
> simplify
> > >> the
> > >> implementation if we can use the same request type. For example, what
> if
> > >> we
> > >> bump the protocol version for `AddPartitionsToTxn` and add a
> > >> `validateOnly`
> > >> flag? For older versions, we can set `validateOnly=true` so that the
> > >> request only returns successfully if the partition had already been
> > added.
> > >> For new versions, we can set `validateOnly=false` and the partition
> will
> > >> be
> > >> added to the transaction. The other slightly annoying thing that this
> > >> would
> > >> get around is the need to collect the transaction state for all
> > partitions
> > >> even when we only care about a subset.
> > >>
> > >> Some additional improvements to consider:
> > >>
> > >> - We can give `AddPartitionsToTxn` better batch support for
> inter-broker
> > >> usage. Currently we only allow one `TransactionalId` to be specified,
> > but
> > >> the broker may get some benefit being able to batch across multiple
> > >> 

Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-06 Thread Jason Gustafson
Hey Justine,

> I was wondering about compatibility here. When we send requests
between brokers, we want to ensure that the receiving broker understands
the request (specifically the new fields). Typically this is done via
IBP/metadata version.
I'm trying to think if there is a way around it but I'm not sure there is.

Yes. I think we would gate usage of this behind an IBP bump. Does that seem
reasonable?

> As for the improvements -- can you clarify how the multiple transactional
IDs would help here? Were you thinking of a case where we wait/batch
multiple produce requests together? My understanding for now was 1
transactional ID and one validation per 1 produce request.

Each call to `AddPartitionsToTxn` is essentially a write to the transaction
log and must block on replication. The more we can fit into a single
request, the more writes we can do in parallel. The alternative is to make
use of more connections, but usually we prefer batching since the network
stack is not really optimized for high connection/request loads.

> Finally with respect to the authorizations, I think it makes sense to skip
topic authorizations, but I'm a bit confused by the "leader ID" field.
Wouldn't we just want to flag the request as from a broker (does it matter
which one?).

We could also make it version-based. For the next version, we could require
CLUSTER auth. So clients would not be able to use the API anymore, which is
probably what we want.

-Jason

On Fri, Jan 6, 2023 at 10:43 AM Justine Olshan 
wrote:

> As a follow up, I was just thinking about the batching a bit more.
> I suppose if we have one request in flight and we queue up the other
> produce requests in some sort of purgatory, we could send information out
> for all of them rather than one by one. So that would be a benefit of
> batching partitions to add per transaction.
>
> I'll need to think a bit more on the design of this part of the KIP, and
> will update the KIP in the next few days.
>
> Thanks,
> Justine
>
> On Fri, Jan 6, 2023 at 10:22 AM Justine Olshan 
> wrote:
>
> > Hey Jason -- thanks for the input -- I was just digging a bit deeper into
> > the design + implementation of the validation calls here and what you say
> > makes sense.
> >
> > I was wondering about compatibility here. When we send requests
> > between brokers, we want to ensure that the receiving broker understands
> > the request (specifically the new fields). Typically this is done via
> > IBP/metadata version.
> > I'm trying to think if there is a way around it but I'm not sure there
> is.
> >
> > As for the improvements -- can you clarify how the multiple transactional
> > IDs would help here? Were you thinking of a case where we wait/batch
> > multiple produce requests together? My understanding for now was 1
> > transactional ID and one validation per 1 produce request.
> >
> > Finally with respect to the authorizations, I think it makes sense to
> skip
> > topic authorizations, but I'm a bit confused by the "leader ID" field.
> > Wouldn't we just want to flag the request as from a broker (does it
> matter
> > which one?).
> >
> > I think I want to adopt these suggestions, just had a few questions on
> the
> > details.
> >
> > Thanks,
> > Justine
> >
> > On Thu, Jan 5, 2023 at 5:05 PM Jason Gustafson
> 
> > wrote:
> >
> >> Hi Justine,
> >>
> >> Thanks for the proposal.
> >>
> >> I was thinking about the implementation a little bit. In the current
> >> proposal, the behavior depends on whether we have an old or new client.
> >> For
> >> old clients, we send `DescribeTransactions` and verify the result and
> for
> >> new clients, we send `AddPartitionsToTxn`. We might be able to simplify
> >> the
> >> implementation if we can use the same request type. For example, what if
> >> we
> >> bump the protocol version for `AddPartitionsToTxn` and add a
> >> `validateOnly`
> >> flag? For older versions, we can set `validateOnly=true` so that the
> >> request only returns successfully if the partition had already been
> added.
> >> For new versions, we can set `validateOnly=false` and the partition will
> >> be
> >> added to the transaction. The other slightly annoying thing that this
> >> would
> >> get around is the need to collect the transaction state for all
> partitions
> >> even when we only care about a subset.
> >>
> >> Some additional improvements to consider:
> >>
> >> - We can give `AddPartitionsToTxn` better batch support for inter-broker
> >> usage. Currently we only allow one `TransactionalId` to be specified,
> but
> >> the broker may get some benefit being able to batch across multiple
> >> transactions.
> >> - Another small improvement is skipping topic authorization checks for
> >> `AddPartitionsToTxn` when the request is from a broker. Perhaps we can
> add
> >> a field for the `LeaderId` or something like that and require CLUSTER
> >> permission when set.
> >>
> >> Best,
> >> Jason
> >>
> >>
> >>
> >> On Mon, Dec 19, 2022 at 3:56 PM Jun Rao 
> wrote:
> >>
> >> > Hi, 

[ANNOUNCE] New committer: Edoardo Comar

2023-01-06 Thread Jun Rao
Hi, Everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Edoardo
Comar.

Edoardo has been a long time Kafka contributor since 2016. His major
contributions are the following.

KIP-302: Enable Kafka clients to use all DNS resolved IP addresses
KIP-277: Fine Grained ACL for CreateTopics API
KIP-136: Add Listener name to SelectorMetrics tags

Congratulations, Edoardo!

Thanks,

Jun (on behalf of the Apache Kafka PMC)


[jira] [Created] (KAFKA-14601) Improve exception handling in KafkaEventQueue

2023-01-06 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-14601:


 Summary: Improve exception handling in KafkaEventQueue
 Key: KAFKA-14601
 URL: https://issues.apache.org/jira/browse/KAFKA-14601
 Project: Kafka
  Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe


If KafkaEventQueue gets an InterruptedException while waiting for a condition 
variable, it currently exits immediately. Instead, it should complete the 
remaining events exceptionally and then execute the cleanup event. This will 
allow us to finish any necessary cleanup steps.

Also, handle cases where Event#handleException itself throws an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14600) Flaky test ProducerIdExpirationTest

2023-01-06 Thread Greg Harris (Jira)
Greg Harris created KAFKA-14600:
---

 Summary: Flaky test ProducerIdExpirationTest
 Key: KAFKA-14600
 URL: https://issues.apache.org/jira/browse/KAFKA-14600
 Project: Kafka
  Issue Type: Test
Reporter: Greg Harris
Assignee: Greg Harris


The ProducerIdExpiration test appears to have these flakey failures:

Build / JDK 8 and Scala 2.13 / 
testTransactionAfterTransactionIdExpiresButProducerIdRemains(String).quorum=zk 
– kafka.api.ProducerIdExpirationTest: 5 failures
        
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1484/tests/
                org.opentest4j.AssertionFailedError: Producer IDs were not 
added.
        
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1484/tests/
                org.opentest4j.AssertionFailedError: Producer IDs were not 
added.
        
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1479/tests/
                org.opentest4j.AssertionFailedError: Producer IDs were not 
added.
        
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1472/tests/
                org.opentest4j.AssertionFailedError: Producer IDs were not 
added.
        
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/trunk/1472/tests/
                org.opentest4j.AssertionFailedError: Producer IDs were not 
added.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-06 Thread Justine Olshan
As a follow up, I was just thinking about the batching a bit more.
I suppose if we have one request in flight and we queue up the other
produce requests in some sort of purgatory, we could send information out
for all of them rather than one by one. So that would be a benefit of
batching partitions to add per transaction.

I'll need to think a bit more on the design of this part of the KIP, and
will update the KIP in the next few days.

Thanks,
Justine

On Fri, Jan 6, 2023 at 10:22 AM Justine Olshan  wrote:

> Hey Jason -- thanks for the input -- I was just digging a bit deeper into
> the design + implementation of the validation calls here and what you say
> makes sense.
>
> I was wondering about compatibility here. When we send requests
> between brokers, we want to ensure that the receiving broker understands
> the request (specifically the new fields). Typically this is done via
> IBP/metadata version.
> I'm trying to think if there is a way around it but I'm not sure there is.
>
> As for the improvements -- can you clarify how the multiple transactional
> IDs would help here? Were you thinking of a case where we wait/batch
> multiple produce requests together? My understanding for now was 1
> transactional ID and one validation per 1 produce request.
>
> Finally with respect to the authorizations, I think it makes sense to skip
> topic authorizations, but I'm a bit confused by the "leader ID" field.
> Wouldn't we just want to flag the request as from a broker (does it matter
> which one?).
>
> I think I want to adopt these suggestions, just had a few questions on the
> details.
>
> Thanks,
> Justine
>
> On Thu, Jan 5, 2023 at 5:05 PM Jason Gustafson 
> wrote:
>
>> Hi Justine,
>>
>> Thanks for the proposal.
>>
>> I was thinking about the implementation a little bit. In the current
>> proposal, the behavior depends on whether we have an old or new client.
>> For
>> old clients, we send `DescribeTransactions` and verify the result and for
>> new clients, we send `AddPartitionsToTxn`. We might be able to simplify
>> the
>> implementation if we can use the same request type. For example, what if
>> we
>> bump the protocol version for `AddPartitionsToTxn` and add a
>> `validateOnly`
>> flag? For older versions, we can set `validateOnly=true` so that the
>> request only returns successfully if the partition had already been added.
>> For new versions, we can set `validateOnly=false` and the partition will
>> be
>> added to the transaction. The other slightly annoying thing that this
>> would
>> get around is the need to collect the transaction state for all partitions
>> even when we only care about a subset.
>>
>> Some additional improvements to consider:
>>
>> - We can give `AddPartitionsToTxn` better batch support for inter-broker
>> usage. Currently we only allow one `TransactionalId` to be specified, but
>> the broker may get some benefit being able to batch across multiple
>> transactions.
>> - Another small improvement is skipping topic authorization checks for
>> `AddPartitionsToTxn` when the request is from a broker. Perhaps we can add
>> a field for the `LeaderId` or something like that and require CLUSTER
>> permission when set.
>>
>> Best,
>> Jason
>>
>>
>>
>> On Mon, Dec 19, 2022 at 3:56 PM Jun Rao  wrote:
>>
>> > Hi, Justine,
>> >
>> > Thanks for the explanation. It makes sense to me now.
>> >
>> > Jun
>> >
>> > On Mon, Dec 19, 2022 at 1:42 PM Justine Olshan
>> > 
>> > wrote:
>> >
>> > > Hi Jun,
>> > >
>> > > My understanding of the mechanism is that when we get to the last
>> epoch,
>> > we
>> > > increment to the fencing/last epoch and if any further requests come
>> in
>> > for
>> > > this producer ID they are fenced. Then the producer gets a new ID and
>> > > restarts with epoch/sequence 0. The fenced epoch sticks around for the
>> > > duration of producer.id.expiration.ms and blocks any late messages
>> > there.
>> > > The new ID will get to take advantage of the improved semantics around
>> > > non-zero start sequences. So I think we are covered.
>> > >
>> > > The only potential issue is overloading the cache, but hopefully the
>> > > improvements (lowered producer.id.expiration.ms) will help with that.
>> > Let
>> > > me know if you still have concerns.
>> > >
>> > > Thanks,
>> > > Justine
>> > >
>> > > On Mon, Dec 19, 2022 at 10:24 AM Jun Rao 
>> > wrote:
>> > >
>> > > > Hi, Justine,
>> > > >
>> > > > Thanks for the explanation.
>> > > >
>> > > > 70. The proposed fencing logic doesn't apply when pid changes, is
>> that
>> > > > right? If so, I am not sure how complete we are addressing this
>> issue
>> > if
>> > > > the pid changes more frequently.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Dec 16, 2022 at 9:16 AM Justine Olshan
>> > > > 
>> > > > wrote:
>> > > >
>> > > > > Hi Jun,
>> > > > >
>> > > > > Thanks for replying!
>> > > > >
>> > > > > 70.We already do the overflow mechanism, so my change would just
>> make
>> > > it
>> > > > > 

Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-06 Thread Justine Olshan
Hey Jason -- thanks for the input -- I was just digging a bit deeper into
the design + implementation of the validation calls here and what you say
makes sense.

I was wondering about compatibility here. When we send requests
between brokers, we want to ensure that the receiving broker understands
the request (specifically the new fields). Typically this is done via
IBP/metadata version.
I'm trying to think if there is a way around it but I'm not sure there is.

As for the improvements -- can you clarify how the multiple transactional
IDs would help here? Were you thinking of a case where we wait/batch
multiple produce requests together? My understanding for now was 1
transactional ID and one validation per 1 produce request.

Finally with respect to the authorizations, I think it makes sense to skip
topic authorizations, but I'm a bit confused by the "leader ID" field.
Wouldn't we just want to flag the request as from a broker (does it matter
which one?).

I think I want to adopt these suggestions, just had a few questions on the
details.

Thanks,
Justine

On Thu, Jan 5, 2023 at 5:05 PM Jason Gustafson 
wrote:

> Hi Justine,
>
> Thanks for the proposal.
>
> I was thinking about the implementation a little bit. In the current
> proposal, the behavior depends on whether we have an old or new client. For
> old clients, we send `DescribeTransactions` and verify the result and for
> new clients, we send `AddPartitionsToTxn`. We might be able to simplify the
> implementation if we can use the same request type. For example, what if we
> bump the protocol version for `AddPartitionsToTxn` and add a `validateOnly`
> flag? For older versions, we can set `validateOnly=true` so that the
> request only returns successfully if the partition had already been added.
> For new versions, we can set `validateOnly=false` and the partition will be
> added to the transaction. The other slightly annoying thing that this would
> get around is the need to collect the transaction state for all partitions
> even when we only care about a subset.
>
> Some additional improvements to consider:
>
> - We can give `AddPartitionsToTxn` better batch support for inter-broker
> usage. Currently we only allow one `TransactionalId` to be specified, but
> the broker may get some benefit being able to batch across multiple
> transactions.
> - Another small improvement is skipping topic authorization checks for
> `AddPartitionsToTxn` when the request is from a broker. Perhaps we can add
> a field for the `LeaderId` or something like that and require CLUSTER
> permission when set.
>
> Best,
> Jason
>
>
>
> On Mon, Dec 19, 2022 at 3:56 PM Jun Rao  wrote:
>
> > Hi, Justine,
> >
> > Thanks for the explanation. It makes sense to me now.
> >
> > Jun
> >
> > On Mon, Dec 19, 2022 at 1:42 PM Justine Olshan
> > 
> > wrote:
> >
> > > Hi Jun,
> > >
> > > My understanding of the mechanism is that when we get to the last
> epoch,
> > we
> > > increment to the fencing/last epoch and if any further requests come in
> > for
> > > this producer ID they are fenced. Then the producer gets a new ID and
> > > restarts with epoch/sequence 0. The fenced epoch sticks around for the
> > > duration of producer.id.expiration.ms and blocks any late messages
> > there.
> > > The new ID will get to take advantage of the improved semantics around
> > > non-zero start sequences. So I think we are covered.
> > >
> > > The only potential issue is overloading the cache, but hopefully the
> > > improvements (lowered producer.id.expiration.ms) will help with that.
> > Let
> > > me know if you still have concerns.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Mon, Dec 19, 2022 at 10:24 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the explanation.
> > > >
> > > > 70. The proposed fencing logic doesn't apply when pid changes, is
> that
> > > > right? If so, I am not sure how complete we are addressing this issue
> > if
> > > > the pid changes more frequently.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > > On Fri, Dec 16, 2022 at 9:16 AM Justine Olshan
> > > > 
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for replying!
> > > > >
> > > > > 70.We already do the overflow mechanism, so my change would just
> make
> > > it
> > > > > happen more often.
> > > > > I was also not suggesting a new field in the log, but in the
> > response,
> > > > > which would be gated by the client version. Sorry if something
> there
> > is
> > > > > unclear. I think we are starting to diverge.
> > > > > The goal of this KIP is to not change to the marker format at all.
> > > > >
> > > > > 71. Yes, I guess I was going under the assumption that the log
> would
> > > just
> > > > > look at its last epoch and treat it as the current epoch. I suppose
> > we
> > > > can
> > > > > have some special logic that if the last epoch was on a marker we
> > > > actually
> > > > > expect the next epoch or something like that. We just need 

[GitHub] [kafka-site] bbejeck commented on pull request #471: Nussknacker.io added to powered-by page

2023-01-06 Thread GitBox


bbejeck commented on PR #471:
URL: https://github.com/apache/kafka-site/pull/471#issuecomment-1373871257

   merged #471 into asf-site


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] bbejeck merged pull request #471: Nussknacker.io added to powered-by page

2023-01-06 Thread GitBox


bbejeck merged PR #471:
URL: https://github.com/apache/kafka-site/pull/471


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] bbejeck commented on a diff in pull request #464: Powerd by Dream11 section, grammar fixes

2023-01-06 Thread GitBox


bbejeck commented on code in PR #464:
URL: https://github.com/apache/kafka-site/pull/464#discussion_r1063582025


##
powered-by.html:
##
@@ -6,7 +6,7 @@
 "link": 
"https://tech.dream11.in/blog/2020-01-07_Data-Highway---Dream11-s-Inhouse-Analytics-Platform---The-Burden-and-Benefits-90b8777d282;,
 "logo": "dream11.jpg",
 "logoBgColor": "#e1",
-"description": "We use apache kafka heavily for data ingestion to Data 
platform, streaming and batch analytics, and our micro services to communicate 
one another. Kafka has been core component of overall Dream11 Tech stack"
+"description": "We use apache Kafka heavily for data ingestion to the 
Data platform, streaming as well as batch analytics, and for our microservices 
to communicate with one another. Kafka is a core component of the overall 
Dream11 Tech stack."

Review Comment:
   ```suggestion
   "description": "We use Apache Kafka heavily for data ingestion to 
the Data platform, streaming as well as batch analytics, and for our 
microservices to communicate with one another. Kafka is a core component of the 
overall Dream11 Tech stack."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1489

2023-01-06 Thread Apache Jenkins Server
See 




[DISCUSS] KIP-894: Use incrementalAlterConfigs API for syncing topic configurations

2023-01-06 Thread Gantigmaa Selenge
Hi everyone,

I would like to start a discussion on the MirrorMaker update that proposes
replacing the deprecated alterConfigs API with the incrementalAlterConfigs
API for syncing topic configurations. Please take a look at the proposal
here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-894%3A+Use+incrementalAlterConfigs+API+for+syncing+topic+configurations


Regards,
Tina


[GitHub] [kafka-site] mimaison merged pull request #474: MINOR: Bump year to 2023

2023-01-06 Thread GitBox


mimaison merged PR #474:
URL: https://github.com/apache/kafka-site/pull/474


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-14072) Crashed MirrorCheckpointConnector appears as running in REST API

2023-01-06 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-14072.

Fix Version/s: 3.5.0
   Resolution: Fixed

This looks like it's the same issue as KAFKA-14545

> Crashed MirrorCheckpointConnector appears as running in REST API
> 
>
> Key: KAFKA-14072
> URL: https://issues.apache.org/jira/browse/KAFKA-14072
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 3.1.0
>Reporter: Mickael Maison
>Priority: Major
> Fix For: 3.5.0
>
>
> In one cluster I had a partially crashed MirrorCheckpointConnector instance. 
> It had stopped mirroring offsets and emitting metrics completely but the 
> connector and its single task were still reporting as running in the REST API.
> Looking at the logs, I found this stacktrace:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpoint(MirrorCheckpointTask.java:187)
>   at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.lambda$checkpointsForGroup$2(MirrorCheckpointTask.java:171)
>   at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>   at 
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
>   at 
> java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1764)
>   at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>   at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>   at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>   at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
>   at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpointsForGroup(MirrorCheckpointTask.java:173)
>   at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.sourceRecordsForGroup(MirrorCheckpointTask.java:157)
>   at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.poll(MirrorCheckpointTask.java:139)
>   at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:291)
>   at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> WARN [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] Failure 
> polling consumer state for checkpoints. 
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask) 
> [task-thread-prod-source->sc-prod-target.MirrorCheckpointConnector-0]
> {code}
> Not sure if it's related but prior this exception, there's quite a lot of:
> {code:java}
> ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] 
> WorkerSourceTask{id=prod-source->sc-prod-target.MirrorCheckpointConnector-0} 
> failed to send record to prod-source.checkpoints.internal:  
> (org.apache.kafka.connect.runtime.WorkerSourceTask) 
> [kafka-producer-network-thread | 
> connector-producer-prod-source->sc-prod-target.MirrorCheckpointConnector-0]
> org.apache.kafka.common.KafkaException: Producer is closed forcefully.
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:760)
>   at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:747)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:283)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> and some users had started consumers in the target cluster hence causing 
> these log lines:
> {code:java}
> ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] 
> [AdminClient clientId=adminclient-137] OffsetCommit request for group id 
>  and partition  failed due to unexpected error 
> UNKNOWN_MEMBER_ID. 
> (org.apache.kafka.clients.admin.internals.AlterConsumerGroupOffsetsHandler) 
> [kafka-admin-client-thread | adminclient-137]
> {code}
> Unfortunately I don't have the full history, so it's 

[jira] [Created] (KAFKA-14599) MirrorMaker pluggable interfaces missing from public API

2023-01-06 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-14599:
--

 Summary: MirrorMaker pluggable interfaces missing from public API
 Key: KAFKA-14599
 URL: https://issues.apache.org/jira/browse/KAFKA-14599
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Reporter: Mickael Maison


MirrorMaker exposes a few pluggable APIs, including:
ConfigPropertyFilter
GroupFilter
TopicFilter
ForwardingAdmin

These are currently missing from our javadoc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14598) Fix flaky ConnectRestApiTest

2023-01-06 Thread Ashwin Pankaj (Jira)
Ashwin Pankaj created KAFKA-14598:
-

 Summary: Fix flaky ConnectRestApiTest
 Key: KAFKA-14598
 URL: https://issues.apache.org/jira/browse/KAFKA-14598
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Ashwin Pankaj
Assignee: Ashwin Pankaj


ConnectRestApiTest sometimes fails with the message

```
ConnectRestError(404, '\n\n\nError 404 Not 
Found\n\nHTTP ERROR 404 Not 
Found\n\nURI:/connector-plugins/\nSTATUS:404\nMESSAGE:Not
 
Found\nSERVLET:-\n\n\n\n\n',
 'http://172.31.1.75:8083/connector-plugins/')
```

This happens because ConnectDistributedService.start() by default waits till 
the the line

```
Joined group at generation ..
```
 is visible in the logs.

In most cases this is sufficient. But in the cases where the test fails, we see 
that this message appears even before Connect RestServer has finished 
initialization.

```
- [2022-12-15 15:40:29,064] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Joined group at generation 2 with protocol version 1 
and got assignment: Assignment{error=0, 
leader='connect-1-07d9da63-9acb-4633-aee4-1ab79f4ab1ae', 
leaderUrl='http://worker34:8083/', offset=-1, connectorIds=[], taskIds=[], 
revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
- [2022-12-15 15:40:29,560] INFO 172.31.5.66 - - [15/Dec/2022:15:40:29 
+] "GET /connector-plugins/ HTTP/1.1" 404 375 "-" "python-requests/2.24.0" 
71 (org.apache.kafka.connect.runtime.rest.RestServer)
- [2022-12-15 15:40:29,579] INFO REST resources initialized; server is 
started and ready to handle requests 
(org.apache.kafka.connect.runtime.rest.RestServer)

```

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)