Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Colin McCabe
Hi Stanislav,

Thanks for making the first RC. The fact that it's titled RC2 is messing with 
my mind a bit. I hope this doesn't make people think that we're farther along 
than we are, heh.

On Sun, Jan 14, 2024, at 13:54, Jakub Scholz wrote:
> *> Nice catch! It does seem like we should have gated this behind the
> metadata> version as KIP-858 implies. Is the cluster configured with
> multiple log> dirs? What is the impact of the error messages?*
>
> I did not observe any obvious impact. I was able to send and receive
> messages as normally. But to be honest, I have no idea what else
> this might impact, so I did not try anything special.
>
> I think everyone upgrading an existing KRaft cluster will go through this
> stage (running Kafka 3.7 with an older metadata version for at least a
> while). So even if it is just a logged exception without any other impact I
> wonder if it might scare users from upgrading. But I leave it to others to
> decide if this is a blocker or not.
>

Hi Jakub,

Thanks for trying the RC. I think what you found is a blocker bug because it 
will generate huge amount of logspam. I guess we didn't find it in junit tests 
since logspam doesn't fail the automated tests. But certainly it's not suitable 
for production. Did you file a JIRA yet?

> On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
>  wrote:
>
>> Hey Luke,
>>
>> This is an interesting problem. Given the fact that the KIP for having a
>> 3.8 release passed, I think it weights the scale towards not calling this a
>> blocker and expecting it to be solved in 3.7.1.
>>
>> It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
>> (given the inability to rollback safely), but if that's true - the same
>> case would apply for 3.6.0. So in any case users w\ould be expected to use a
>> patch release for this.

Hi Luke,

Thanks for testing rollback. I think this is a case where the documentation is 
wrong. The intention was to for the steps to basically be:

1. roll all the brokers into zk mode, but with migration enabled
2. take down the kraft quorum
3. rmr /controller, allowing a hybrid broker to take over.
4. roll all the brokers into zk mode without migration enabled (if desired)

With these steps, there isn't really unavailability since a ZK controller can 
be elected quickly after the kraft quorum is gone.

>> Further, since we will have a 3.8 release - it is
>> likely we will ultimately recommend users upgrade from that version given
>> its aim is to have strategic KRaft feature parity with ZK.
>> That being said, I am not 100% on this. Let me know whether you think this
>> should block the release, Luke. I am also tagging Colin and David to weigh
>> in with their opinions, as they worked on the migration logic.

The rollback docs are new in 3.7 so the fact that they're wrong is a clear 
blocker, I think. But easy to fix, I believe. I will create a PR.

best,
Colin

>>
>> Hey Kirk and Chris,
>>
>> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
>> improper closing. And the PR description implies this has been present
>> since 3.5. While annoying, I don't see a strong reason for this to block
>> the release.
>>
>> Hey Jakub,
>>
>> Nice catch! It does seem like we should have gated this behind the metadata
>> version as KIP-858 implies. Is the cluster configured with multiple log
>> dirs? What is the impact of the error messages?
>>
>> Tagging Igor (the author of the KIP) to weigh in.
>>
>> Best,
>> Stanislav
>>
>> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
>>
>> > Hi,
>> >
>> > I was trying the RC2 and run into the following issue ... when I run
>> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
>> > version, I seem to be getting repeated errors like this in the controller
>> > logs:
>> >
>> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
>> assignReplicasToDirs:
>> > event failed with UnsupportedVersionException in 15 microseconds.
>> > (org.apache.kafka.controller.QuorumController)
>> > [quorum-controller-0-event-handler]
>> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
>> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
>> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
>> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
>> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
>> > partitions=[PartitionData(partitionIndex=2),
>> > PartitionData(partitionIndex=1)]),
>> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
>> > partitions=[PartitionData(partitionIndex=0)])])]) with context
>> > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
>> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
>> > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
>> > 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
>> > 

[jira] [Resolved] (KAFKA-16118) Coordinator unloading fails when replica is deleted

2024-01-14 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16118.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Coordinator unloading fails when replica is deleted
> ---
>
> Key: KAFKA-16118
> URL: https://issues.apache.org/jira/browse/KAFKA-16118
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.8.0
>
>
> The new group coordinator always expects the leader epoch to be received when 
> it must unload the metadata for a partition. However, in KRaft, the leader 
> epoch is not passed when the replica is delete (e.g. after reassignment).
> {noformat}
> java.lang.IllegalArgumentException: The leader epoch should always be 
> provided in KRaft.
>     at 
> org.apache.kafka.coordinator.group.GroupCoordinatorService.onResignation(GroupCoordinatorService.java:931)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$9$adapted(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$updateCoordinator$4(BrokerMetadataPublisher.scala:397)
>     at java.base/java.lang.Iterable.forEach(Iterable.java:75)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.updateCoordinator(BrokerMetadataPublisher.scala:396)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$onMetadataUpdate$7(BrokerMetadataPublisher.scala:200)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.onMetadataUpdate(BrokerMetadataPublisher.scala:186)
>     at 
> org.apache.kafka.image.loader.MetadataLoader.maybePublishMetadata(MetadataLoader.java:382)
>     at 
> org.apache.kafka.image.loader.MetadataBatchLoader.applyDeltaAndUpdate(MetadataBatchLoader.java:286)
>     at 
> org.apache.kafka.image.loader.MetadataBatchLoader.maybeFlushBatches(MetadataBatchLoader.java:222)
>     at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$handleCommit$1(MetadataLoader.java:406)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>     at java.base/java.lang.Thread.run(Thread.java:1583)
>     at 
> org.apache.kafka.common.utils.KafkaThread.run(KafkaThread.java:66){noformat}
> The side effect of this bug is that group coordinator loading/unloading fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16120) Partition reassignments in ZK migration dual write leaves stray partitions

2024-01-14 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-16120.
--
Fix Version/s: 3.7.0
 Reviewer: Colin McCabe
 Assignee: David Mao
   Resolution: Fixed

> Partition reassignments in ZK migration dual write leaves stray partitions
> --
>
> Key: KAFKA-16120
> URL: https://issues.apache.org/jira/browse/KAFKA-16120
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Mao
>Assignee: David Mao
>Priority: Major
> Fix For: 3.7.0
>
>
> When a reassignment is completed in ZK migration dual-write mode, the 
> `StopReplica` sent by the kraft quorum migration propagator is sent with 
> `delete = false` for deleted replicas when processing the topic delta. This 
> results in stray replicas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15538) Client support for java regex based subscription

2024-01-14 Thread Phuc Hong Tran (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phuc Hong Tran resolved KAFKA-15538.

Resolution: Fixed

> Client support for java regex based subscription
> 
>
> Key: KAFKA-15538
> URL: https://issues.apache.org/jira/browse/KAFKA-15538
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, consumer
>Reporter: Lianet Magrans
>Assignee: Phuc Hong Tran
>Priority: Major
>  Labels: kip-848, kip-848-client-support
> Fix For: 3.8.0
>
>
> When using subscribe with a java regex (Pattern), we need to resolve it on 
> the client side to send the broker a list of topic names to subscribe to.
> Context:
> The new consumer group protocol uses [Google 
> RE2/J|https://github.com/google/re2j] for regular expressions and introduces 
> new methods in the consumer API to subscribe using a `SubscribePattern`. The 
> subscribe using a java `Pattern` will be still supported for a while but 
> eventually removed.
>  * When the subscribe with SubscriptionPattern is used, the client should 
> just send the regex to the broker and it will be resolved on the server side.
>  * In the case of the subscribe with Pattern, the regex should be resolved on 
> the client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: DISCUSS KIP-1011: Use incrementalAlterConfigs when updating broker configs by kafka-configs.sh

2024-01-14 Thread ziming deng
Hello Luke,

thank you for finding this error, I have rectified it, and I will start a vote 
process soon.

--
Best,
Ziming


> On Jan 12, 2024, at 16:32, Luke Chen  wrote:
> 
> Hi Ziming,
> 
> Thanks for the KIP!
> LGTM!
> Using incremental by defaul and fallback automatically if it's not
> supported is a good idea!
> 
> One minor comment:
> 1. so I'm inclined to move it to incrementalAlterConfigs  and "provide a
> flag" to still use alterConfigs  for new client to interact with old
> servers.
> I don't think we will "provide any flag" after the discussion. We should
> remove it.
> 
> Thanks.
> Luke
> 
> On Fri, Jan 12, 2024 at 12:29 PM ziming deng  >
> wrote:
> 
>> Thank you for your clarification, Chris,
>> 
>> I have spent some time to review KIP-894 and I think it's automatic way is
>> better and bring no side effect, and I will also adopt this way here.
>> As you mentioned, the changes in semantics is minor, the most important
>> reason for this change is fixing bug brought by sensitive configs.
>> 
>> 
>>> We
>>> don't appear to support appending/subtracting from list properties via
>> the
>>> CLI for any other entity type right now,
>> You are right about this, I tried and found that we can’t subtract or
>> append configs, I will change the KIP to "making way for
>> appending/subtracting list properties"
>> 
>> --
>> Best,
>> Ziming
>> 
>>> On Jan 6, 2024, at 01:34, Chris Egerton  wrote:
>>> 
>>> Hi all,
>>> 
>>> Can we clarify any changes in the user-facing semantics for the CLI tool
>>> that would come about as a result of this KIP? I think the debate over
>> the
>>> necessity of an opt-in flag, or waiting for 4.0.0, ultimately boils down
>> to
>>> this.
>>> 
>>> My understanding is that the only changes in semantics are fairly minor
>>> (semantic versioning pun intended):
>>> 
>>> - Existing sensitive broker properties no longer have to be explicitly
>>> specified on the command line if they're not being changed
>>> - A small race condition is fixed where the broker config is updated by a
>>> separate operation in between when the CLI reads the existing broker
>> config
>>> and writes the new broker config
>>> - Usage of a new broker API that has been supported since version 2.3.0,
>>> but which does not require any new ACLs and does not act any differently
>>> apart from the two small changes noted above
>>> 
>>> If this is correct, then I'm inclined to agree with Ismael's suggestion
>> of
>>> starting with incrementalAlterConfigs, and falling back on alterConfigs
>> if
>>> the former is not supported by the broker, and do not believe it's
>>> necessary to wait for 4.0.0 or provide opt-in or opt-out flags to release
>>> this change. This would also be similar to changes we made to
>> MirrorMaker 2
>>> in KIP-894 [1], where the default behavior for syncing topic configs is
>> now
>>> to start with incrementalAlterConfigs and fall back on alterConfigs if
>> it's
>>> not supported.
>>> 
>>> If there are other, more significant changes to the user-facing semantics
>>> for the CLI, then these should be called out here and in the KIP, and we
>>> might consider a more cautious approach.
>>> 
>>> [1] -
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-894%3A+Use+incrementalAlterConfigs+API+for+syncing+topic+configurations
>>> 
>>> 
>>> Also, regarding this part of the KIP:
>>> 
 incrementalAlterConfigs is more convenient especially for updating
>>> configs of list data type, such as
>> "leader.replication.throttled.replicas"
>>> 
>>> While this is true for the Java admin client and the corresponding broker
>>> APIs, it doesn't appear to be relevant to the kafka-configs.sh CLI tool.
>> We
>>> don't appear to support appending/subtracting from list properties via
>> the
>>> CLI for any other entity type right now, and there's nothing in the KIP
>>> that leads me to believe we'd be adding it for broker configs.
>>> 
>>> Cheers,
>>> 
>>> Chris
>>> 
>>> On Thu, Jan 4, 2024 at 10:12 PM ziming deng > >
>>> wrote:
>>> 
 Hi Ismael,
 I added this automatically approach to “Rejected alternatives”
>> concerning
 that we need to unify the semantics between alterConfigs and
 incrementalAlterConfigs, so I choose to give this privilege to users.
 
 After reviewing these code and doing some tests I found that they
 following the similar approach, I think the simplest way is to let the
 client choose the best method heuristically.
 
 Thank you for pointing out this, I will change the KIP later.
 
 Best,
 Ziming
 
> On Jan 4, 2024, at 17:28, Ismael Juma  wrote:
> 
> Hi Ziming,
> 
> Why is the flag required at all? Can we use incremental and fallback
 automatically if it's not supported by the broker? At this point, the
>> vast
 majority of clusters should support it.
> 
> Ismael
> 
> On Mon, Dec 18, 2023 at 7:58 PM ziming deng >>> 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2571

2024-01-14 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-16129) Add integration test for KIP-977

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16129:
--

 Summary: Add integration test for KIP-977
 Key: KAFKA-16129
 URL: https://issues.apache.org/jira/browse/KAFKA-16129
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics. If the value 
*{{high}}* is set for the *{{level}}* key of the configured JSON(see below for 
example values), high fan-out tags(e.g. {*}{{partition}}{*})will be added to 
metrics specified by the {{*name*}} filter and will apply to all the topics 
that meet the conditions in the {{*filters*}} section. In the *{{low}}* 
settings, these tags will be assigned with an empty value. We elected to make 
it central so that this implementation can be generalized in the future either 
into a library, or allow other means for centralized control.

More details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics

 

The following 3 tests will be done for common metrics collectors: JMX, 
Prometheus, and OpenTelemetry.
 # The partition tag can be observed from metrics if high verbosity is used
 # The partition tag should result in an empty string or be filtered out by the 
metrics collector if default verbosity is used
 # Dynamically setting the verbosity can result in the behavior defined in the 
above tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16128) Use metrics.verbosity in throughput metrics

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16128:
--

 Summary: Use metrics.verbosity in throughput metrics
 Key: KAFKA-16128
 URL: https://issues.apache.org/jira/browse/KAFKA-16128
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

This task will link it to the following throughput metrics. Specifically, in 
the event level 'high' is used, the partition dimension should be added.

 
|Metrics Name|Meaning|
|{{MessagesInPerSec}}|Messages entered the partition, per second|
|{{BytesInPerSec}}|Bytes entered the partition, per second|
|{{BytesOutPerSec}}|Bytes retrieved from the partition, per second|
|{{BytesRejectedPerSec}}|Bytes exceeding max message size in a partition, per 
second|
|{{TotalProduceRequestsPerSec}}|Produce request count for a partition, per 
second|
|{{TotalFetchRequestsPerSec}}|Fetch request count for a partition, per second|
|{{FailedProduceRequestsPerSec}}|Failed to produce request count for a 
partition, per second|
|{{FailedFetchRequestsPerSec}}|Failed to fetch request count for a partition, 
per second|
|{{FetchMessageConversionsPerSec}}|Broker side conversions(de-compressions) for 
a partition, per second|
|{{ProduceMessageConversionsPerSec}}|Broker side conversions(compressions) for 
a partition, per second|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Jakub Scholz
*> Nice catch! It does seem like we should have gated this behind the
metadata> version as KIP-858 implies. Is the cluster configured with
multiple log> dirs? What is the impact of the error messages?*

I did not observe any obvious impact. I was able to send and receive
messages as normally. But to be honest, I have no idea what else
this might impact, so I did not try anything special.

I think everyone upgrading an existing KRaft cluster will go through this
stage (running Kafka 3.7 with an older metadata version for at least a
while). So even if it is just a logged exception without any other impact I
wonder if it might scare users from upgrading. But I leave it to others to
decide if this is a blocker or not.

Jakub


On Sun, Jan 14, 2024 at 10:17 PM Stanislav Kozlovski
 wrote:

> Hey Luke,
>
> This is an interesting problem. Given the fact that the KIP for having a
> 3.8 release passed, I think it weights the scale towards not calling this a
> blocker and expecting it to be solved in 3.7.1.
>
> It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
> (given the inability to rollback safely), but if that's true - the same
> case would apply for 3.6.0. So in any case users would be expected to use a
> patch release for this. Further, since we will have a 3.8 release - it is
> likely we will ultimately recommend users upgrade from that version given
> its aim is to have strategic KRaft feature parity with ZK.
> That being said, I am not 100% on this. Let me know whether you think this
> should block the release, Luke. I am also tagging Colin and David to weigh
> in with their opinions, as they worked on the migration logic.
>
> Hey Kirk and Chris,
>
> Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
> improper closing. And the PR description implies this has been present
> since 3.5. While annoying, I don't see a strong reason for this to block
> the release.
>
> Hey Jakub,
>
> Nice catch! It does seem like we should have gated this behind the metadata
> version as KIP-858 implies. Is the cluster configured with multiple log
> dirs? What is the impact of the error messages?
>
> Tagging Igor (the author of the KIP) to weigh in.
>
> Best,
> Stanislav
>
> On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:
>
> > Hi,
> >
> > I was trying the RC2 and run into the following issue ... when I run
> > 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
> > version, I seem to be getting repeated errors like this in the controller
> > logs:
> >
> > 2024-01-13 16:58:01,197 INFO [QuorumController id=0]
> assignReplicasToDirs:
> > event failed with UnsupportedVersionException in 15 microseconds.
> > (org.apache.kafka.controller.QuorumController)
> > [quorum-controller-0-event-handler]
> > 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
> > handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
> > AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
> > directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
> > topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
> > partitions=[PartitionData(partitionIndex=2),
> > PartitionData(partitionIndex=1)]),
> > TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
> > partitions=[PartitionData(partitionIndex=0)])])]) with context
> > RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> > apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
> > connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
> > 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
> > listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL,
> > clientInformation=ClientInformation(softwareName=apache-kafka-java,
> > softwareVersion=3.7.0), fromPrivilegedListener=false,
> >
> >
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2
> > ])
> > (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> > java.util.concurrent.CompletionException:
> > org.apache.kafka.common.errors.UnsupportedVersionException: Directory
> > assignment is not supported yet.
> >
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> >  at
> >
> >
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
> >  at
> >
> >
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
> >  at
> >
> >
> 

[jira] [Created] (KAFKA-16127) Reset corresponding verbosity configs when topic is deleted

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16127:
--

 Summary: Reset corresponding verbosity configs when topic is 
deleted
 Key: KAFKA-16127
 URL: https://issues.apache.org/jira/browse/KAFKA-16127
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

When the topic is deleted, the corresponding topic config should be reset to 
default value because:
 # If the topic is automatically recreated, the verbosity from the previous 
generation may cause too much/too little metrics to be emitted
 # Too many unused configurations may cause user error when configuring this 
field. The typical workflow would be getting the value first, then adding or 
modifying the config to reflect the latest requirement. If we don't delete 
unused entries, it could only grow instead of evolve.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16126) Kcontroller dynamic configurations may fail to apply at startup

2024-01-14 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-16126:


 Summary: Kcontroller dynamic configurations may fail to apply at 
startup
 Key: KAFKA-16126
 URL: https://issues.apache.org/jira/browse/KAFKA-16126
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Colin McCabe
Assignee: Colin McCabe


Some kcontroller dynamic configurations may fail to apply at startup. This 
happens because there is a race between registering the reconfigurables to the 
DynamicBrokerConfig class, and receiving the first update from the metadata 
publisher. We can fix this by registering the reconfigurables first. This seems 
to have been introduced by the "MINOR: Install ControllerServer metadata 
publishers sooner" change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16125) Add parser for the metrics.verbosity config

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16125:
--

 Summary: Add parser for the metrics.verbosity config
 Key: KAFKA-16125
 URL: https://issues.apache.org/jira/browse/KAFKA-16125
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu
Assignee: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

The parser will have the following validation responsibilities:
 # Validate that the config conforms to JSON format.
 # Validate that the level field is configured correctly with a set level
 # Validate that the names field pattern is acceptable to 
{{java.util.regex-compatible}}
 # Validate that the filters field conforms to JSON map format
 # Validate that all the filter patterns are acceptable to 
{{java.util.regex-compatible}}
 # Validate that no other field exists in the config

After parsing and validating the config, the parser should:
 * Generate a config object which contains the level, names, and filter fields. 
There should exist a default verbosity level too.
 * Inside the object, the predicates should also be generated to allow fast 
querying of the rules
 * Overwrite/update the existing predicates if the config is updated

The config object should:
 * Determine the verbosity level supplied with the topic name and the metrics 
name
 * If not defined, fall back to the default config level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] 3.7.0 RC2

2024-01-14 Thread Stanislav Kozlovski
Hey Luke,

This is an interesting problem. Given the fact that the KIP for having a
3.8 release passed, I think it weights the scale towards not calling this a
blocker and expecting it to be solved in 3.7.1.

It is unfortunate that it would not seem safe to migrate to KRaft in 3.7.0
(given the inability to rollback safely), but if that's true - the same
case would apply for 3.6.0. So in any case users would be expected to use a
patch release for this. Further, since we will have a 3.8 release - it is
likely we will ultimately recommend users upgrade from that version given
its aim is to have strategic KRaft feature parity with ZK.
That being said, I am not 100% on this. Let me know whether you think this
should block the release, Luke. I am also tagging Colin and David to weigh
in with their opinions, as they worked on the migration logic.

Hey Kirk and Chris,

Unless I'm missing something - KAFKALESS-16029 is simply a bad log due to
improper closing. And the PR description implies this has been present
since 3.5. While annoying, I don't see a strong reason for this to block
the release.

Hey Jakub,

Nice catch! It does seem like we should have gated this behind the metadata
version as KIP-858 implies. Is the cluster configured with multiple log
dirs? What is the impact of the error messages?

Tagging Igor (the author of the KIP) to weigh in.

Best,
Stanislav

On Sat, Jan 13, 2024 at 7:22 PM Jakub Scholz  wrote:

> Hi,
>
> I was trying the RC2 and run into the following issue ... when I run
> 3.7.0-RC2 KRaft cluster with metadata version set to 3.6-IV2 metadata
> version, I seem to be getting repeated errors like this in the controller
> logs:
>
> 2024-01-13 16:58:01,197 INFO [QuorumController id=0] assignReplicasToDirs:
> event failed with UnsupportedVersionException in 15 microseconds.
> (org.apache.kafka.controller.QuorumController)
> [quorum-controller-0-event-handler]
> 2024-01-13 16:58:01,197 ERROR [ControllerApis nodeId=0] Unexpected error
> handling request RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2) --
> AssignReplicasToDirsRequestData(brokerId=1000, brokerEpoch=5,
> directories=[DirectoryData(id=w_uxN7pwQ6eXSMrOKceYIQ,
> topics=[TopicData(topicId=bvAKLSwmR7iJoKv2yZgygQ,
> partitions=[PartitionData(partitionIndex=2),
> PartitionData(partitionIndex=1)]),
> TopicData(topicId=uNe7f5VrQgO0zST6yH1jDQ,
> partitions=[PartitionData(partitionIndex=0)])])]) with context
> RequestContext(header=RequestHeader(apiKey=ASSIGN_REPLICAS_TO_DIRS,
> apiVersion=0, clientId=1000, correlationId=14, headerVersion=2),
> connectionId='172.16.14.219:9090-172.16.14.217:53590-7', clientAddress=/
> 172.16.14.217, principal=User:CN=my-cluster-kafka,O=io.strimzi,
> listenerName=ListenerName(CONTROLPLANE-9090), securityProtocol=SSL,
> clientInformation=ClientInformation(softwareName=apache-kafka-java,
> softwareVersion=3.7.0), fromPrivilegedListener=false,
>
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@71004ad2
> ])
> (kafka.server.ControllerApis) [quorum-controller-0-event-handler]
> java.util.concurrent.CompletionException:
> org.apache.kafka.common.errors.UnsupportedVersionException: Directory
> assignment is not supported yet.
>
>  at
>
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
>  at
>
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>  at
>
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>  at
>
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
>  at
>
> org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>  at
>
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>  at java.base/java.lang.Thread.run(Thread.java:840)
>
> Caused by: org.apache.kafka.common.errors.UnsupportedVersionException:
> Directory assignment is not supported yet.
>
> Is that expected? I guess with the metadata version set to 3.6-IV2, it
> makes sense that the request is not supported. But shouldn't then the
> request not be sent at all by the brokers? (I did not opened a JIRA for it,
> but I can open one if you agree this is not expected)
>
> Thanks & Regards
> Jakub
>
> On Sat, Jan 13, 2024 at 8:03 AM Luke Chen  wrote:
>
> > Hi Stanislav,

[jira] [Created] (KAFKA-16124) Create metrics.verbosity dynamic config for controlling metrics verbosity

2024-01-14 Thread Qichao Chu (Jira)
Qichao Chu created KAFKA-16124:
--

 Summary: Create metrics.verbosity dynamic config for controlling 
metrics verbosity
 Key: KAFKA-16124
 URL: https://issues.apache.org/jira/browse/KAFKA-16124
 Project: Kafka
  Issue Type: Task
Reporter: Qichao Chu
Assignee: Qichao Chu


*{{metrics.verbosity}}* will be a new dynamic config introduced to control the 
verbosity(fan-out rate) of the metrics. It's a config with JSON format 
specifying the condition controlling fan-out of the metrics.

More details: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-977%3A+Partition-Level+Throughput+Metrics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Kafka trunk test & build stability

2024-01-14 Thread Qichao Chu
Hi Divij and all,

Regarding the speeding up of the build & de-flaking tests, LinkedIn has
done some great work which we probably can borrow ideas from.
In the LinkedIn/Kafka repo, we can see one of their most recent PRs
 only took < 9 min(unit
test) + < 12 min (integration-test) + < 9 (code check) = < 30 min to finish
all the checks:

   1. Similar to what David(mumrah) has mentioned/experimented with, the
   LinkedIn team used GitHub Actions, which displayed the results in a cleaner
   way directly from GitHub.
   2. Each top-level package is checked separately to increase the
   concurrency. To further boost the speed for integration tests, the tests
   inside one package are divided into sub-groups(A-Z) based on their
   names(see this job
    for
   details).
   3. Once the tests are running at a smaller granularity with a decent
   runner, heavy integration tests are less likely to be flaky, and flaky
   tests are easier to catch.


--
Qichao


On Wed, Jan 10, 2024 at 2:57 PM Divij Vaidya 
wrote:

> Hey folks
>
> We seem to have a handle on the OOM issues with the multiple fixes
> community members made. In
> https://issues.apache.org/jira/browse/KAFKA-16052,
> you can see the "before" profile in the description and the "after" profile
> in the latest comment to see the difference. To prevent future recurrence,
> we have an ongoing solution at https://github.com/apache/kafka/pull/15101
> and after that we will start another once to get rid of mockito mocks at
> the end of every test suite using a similar extension. Note that this
> doesn't solve the flaky test problems in the trunk but it removes the
> aspect of build failures due to OOM (one of the many problems).
>
> To fix the flaky test problem, we probably need to run our tests in a
> separate CI environment (like Apache Beam does) instead of sharing the 3
> hosts that run our CI with many many other Apache projects. This assumption
> is based on the fact that the tests are less flaky when running on laptops
> / powerful EC2 machines. One of the avenues to get funding for these
> Kafka-only hosts is
>
> https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/
> . I will start the conversation on this one with AWS & Apache Infra in the
> next 1-2 months.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jan 9, 2024 at 9:21 PM Colin McCabe  wrote:
>
> > Sorry, but to put it bluntly, the current build setup isn't good enough
> at
> > partial rebuilds that build caching would make sense. All Kafka devs have
> > had the experience of needing to clean the build directory in order to
> get
> > a valid build. The scala code esspecially seems to have this issue.
> >
> > regards,
> > Colin
> >
> >
> > On Tue, Jan 2, 2024, at 07:00, Nick Telford wrote:
> > > Addendum: I've opened a PR with what I believe are the changes
> necessary
> > to
> > > enable Remote Build Caching, if you choose to go that route:
> > > https://github.com/apache/kafka/pull/15109
> > >
> > > On Tue, 2 Jan 2024 at 14:31, Nick Telford 
> > wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> Regarding building a "dependency graph"... Gradle already has this
> > >> information, albeit fairly coarse-grained. You might be able to get
> some
> > >> considerable improvement by configuring the Gradle Remote Build Cache.
> > It
> > >> looks like it's currently disabled explicitly:
> > >> https://github.com/apache/kafka/blob/trunk/settings.gradle#L46
> > >>
> > >> The trick is to have trunk builds write to the cache, and PR builds
> only
> > >> read from it. This way, any PR based on trunk should be able to cache
> > not
> > >> only the compilation, but also the tests from dependent modules that
> > >> haven't changed (e.g. for a PR that only touches the connect/streams
> > >> modules).
> > >>
> > >> This would probably be preferable to having to hand-maintain some
> > >> rules/dependency graph in the CI configuration, and it's quite
> > >> straight-forward to configure.
> > >>
> > >> Bonus points if the Remote Build Cache is readable publicly, enabling
> > >> contributors to benefit from it locally.
> > >>
> > >> Regards,
> > >> Nick
> > >>
> > >> On Tue, 2 Jan 2024 at 13:00, Lucas Brutschy  > .invalid>
> > >> wrote:
> > >>
> > >>> Thanks for all the work that has already been done on this in the
> past
> > >>> days!
> > >>>
> > >>> Have we considered running our test suite with
> > >>> -XX:+HeapDumpOnOutOfMemoryError and uploading the heap dumps as
> > >>> Jenkins build artifacts? This could speed up debugging. Even if we
> > >>> store them only for a day and do it only for trunk, I think it could
> > >>> be worth it. The heap dumps shouldn't contain any secrets, and I
> > >>> checked with the ASF infra team, and they are not concerned about the
> > >>> additional disk usage.
> > >>>
> > >>> Cheers,
> > >>> Lucas
> > >>>
> > >>> On Wed, Dec 27, 2023 at 2:25 PM Divij 

Re: [PROPOSAL] Add commercial support page on website

2024-01-14 Thread tison
FWIW - even if it's rejected by the Kafka PMC, you can maintain your
own page for such information and provide your personal comments on
them. If the object is to provide information and help users to make
decisions, it should help. Although you should do the SEO by yourself,
if the information is somehow neutral and valuable, you can ask the
@apachekafka Twitter (X) account to propagate it and provide a blog
for Kafka blogs.

This is the common way how third-party "evangelist" producing content
and get it promoted.

Best,
tison.

Matthias J. Sax  于2024年1月13日周六 07:35写道:
>
> François,
>
> thanks for starting this initiative. Personally, I don't think it's
> necessarily harmful for the project to add such a new page, however, I
> share the same concerns others raised already.
>
> I understand your motivation that people had issues finding commercial
> support, but I am not sure we can address this issue that way. I am also
> "worried" (for the lack of a better word) that the page might become
> long an unwieldy. In the end, any freelancer/consultant offering Kafka
> services would be able to get on the page, so we might get hundreds of
> entries, what also makes it impossible for users to find what they are
> looking for. Also, the services of different companies might vary
> drastically; should users read all these descriptions? I can also
> imagine that some companies offer their services only in some
> countries/regions making it even harder for user to find what they are
> looking for?
>
> Overall, it sounds more like a search optimization problem, and thus it
> seems out-of-scope what we can solve. As I said, I am not strictly
> against it, but I just don't see much value either.
>
>
> -Matthias
>
> On 1/11/24 12:55 PM, Francois Papon wrote:
> > Hi Justine,
> >
> > You're right, Kafka is a part of my business (training, consulting,
> > architecture design, sla...) and most of the time, users/customers said
> > that it was hard for them to find a commercial support (in France for my
> > case) after searching on the Kafka website (Google didn't help them).
> >
> > As an ASF member and PMC of several ASF projects, I know that this kind
> > of page exist so this is why I made this proposal for the Kafka project
> > because I really think that it can help users.
> >
> > As you suggest, I can submit a PR to be added on the "powered by" page.
> >
> > Thanks,
> >
> > François
> >
> > On 11/01/2024 21:00, Justine Olshan wrote:
> >> Hey François,
> >>
> >> My point was that the companies on that page use kafka as part of their
> >> business. If you use Kafka as part of your business feel free to submit a
> >> PR to be added.
> >>
> >> I second Chris's point that other projects are not enough to require
> >> Kafka
> >> having such a support page.
> >>
> >> Justine
> >>
> >> On Thu, Jan 11, 2024 at 11:57 AM Chris Egerton 
> >> wrote:
> >>
> >>> Hi François,
> >>>
> >>> Is it an official policy of the ASF that projects provide a listing of
> >>> commercial support options for themselves? I understand that other
> >>> projects
> >>> have chosen to provide one, but this doesn't necessarily imply that all
> >>> projects should do the same, and I can't say I find this point very
> >>> convincing as a rebuttal to some of the good-faith concerns raised by
> >>> the
> >>> PMC and members of the community so far. However, if there's an official
> >>> ASF stance on this topic, then I acknowledge that Apache Kafka should
> >>> align
> >>> with it.
> >>>
> >>> Best,
> >>>
> >>> Chris
> >>>
> >>>
> >>> On Thu, Jan 11, 2024, 14:50 fpapon  wrote:
> >>>
>  Hi Justine,
> 
>  I'm not sure to see the difference between "happy users" and vendors
>  that advertise their products in some of the company list in the
>  "powered by" page.
> 
>  Btw, my initial purpose of my proposal was to help user to find support
>  for production stuff rather than searching in google.
> 
>  I don't think this is a bad thing because this is something that
>  already
>  exist in many ASF projects like:
> 
>  https://hop.apache.org/community/commercial/
>  https://struts.apache.org/commercial-support.html
>  https://directory.apache.org/commercial-support.html
>  https://tomee.apache.org/commercial-support.html
>  https://plc4x.apache.org/users/commercial-support.html
>  https://camel.apache.org/community/support/
>  https://openmeetings.apache.org/commercial-support.html
>  https://guacamole.apache.org/support/
> 
> 
> >>> https://cwiki.apache.org/confluence/display/HADOOP2/Distributions+and+Commercial+Support
> >>> https://activemq.apache.org/supporthttps://karaf.apache.org/community.html
>  https://netbeans.apache.org/front/main/help/commercial-support/
>  https://royale.apache.org/royale-commercial-support/
> 
>  https://karaf.apache.org/community.html
> 
>  As I understand for now, the channel for users to find production
>  support