[jira] [Commented] (KAFKA-7740) Kafka Admin Client should be able to manage user/client configurations for users and clients

2020-08-24 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183338#comment-17183338
 ] 

Brian Byrne commented on KAFKA-7740:


Hello - resolve is still planned for a future release, however it got hung up 
around an interaction with the `ClientQuotaCallback`. A KIP to address that 
issue will be required. I can update the original KIP to indicate the resolve 
functionality is not yet available.

> Kafka Admin Client should be able to manage user/client configurations for 
> users and clients
> 
>
> Key: KAFKA-7740
> URL: https://issues.apache.org/jira/browse/KAFKA-7740
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.1.0
> Environment: linux
>Reporter: Yaodong Yang
>Assignee: Brian Byrne
>Priority: Major
>  Labels: features
> Fix For: 2.6.0
>
>
> Right now, Kafka Admin Client only allow users to change the configuration of 
> brokers and topics. There are some use cases that users want to setOrUpdate 
> quota configurations for users and clients through Kafka Admin Client. 
> Without this new capability, users have to manually talk to zookeeper for 
> this, which will pose other challenges for customers.
> Considering we have already have the framework for the much complex brokers 
> and topic configuration changes, it seems straightforward to add the support 
> for the alterConfig and describeConfig for users and clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-08-17 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-10158.
-
Resolution: Fixed

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Brian Byrne
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-20 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161425#comment-17161425
 ] 

Brian Byrne commented on KAFKA-10278:
-

Hi [~kaushik srinivas],

If I understand correctly, yes, all configurations should be displayed with 
--all despite how they're updated.

> kafka-configs does not show the current properties of running kafka broker 
> upon describe.
> -
>
> Key: KAFKA-10278
> URL: https://issues.apache.org/jira/browse/KAFKA-10278
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>  Labels: kafka-configs.sh
>
> kafka-configs.sh does not list the properties 
> (read-only/per-broker/cluster-wide) with which the kafka broker is currently 
> running.
> The command returns nothing.
> Only those properties added or updated via kafka-configs.sh is listed by the 
> describe command.
> bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
> kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
> --entity-default --describe Default config for brokers in the cluster are:
>   log.cleaner.threads=2 sensitive=false 
> synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-17 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-10278.
-
Resolution: Won't Fix

Hi Kaushik,

The command is set to only return modified (non-default) configs. As part of 
the KIP-524 work, flag --all was added to 2.5, which will contain the behavior 
you seek.

> kafka-configs does not show the current properties of running kafka broker 
> upon describe.
> -
>
> Key: KAFKA-10278
> URL: https://issues.apache.org/jira/browse/KAFKA-10278
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: kaushik srinivas
>Priority: Critical
>  Labels: kafka-configs.sh
>
> kafka-configs.sh does not list the properties 
> (read-only/per-broker/cluster-wide) with which the kafka broker is currently 
> running.
> The command returns nothing.
> Only those properties added or updated via kafka-configs.sh is listed by the 
> describe command.
> bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
> kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
> --entity-default --describe Default config for brokers in the cluster are:
>   log.cleaner.threads=2 sensitive=false 
> synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10033) AdminClient should throw UnknownTopicOrPartitionException instead of UnknownServerException if altering configs of non-existing topic

2020-05-22 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne reassigned KAFKA-10033:
---

Assignee: Brian Byrne

> AdminClient should throw UnknownTopicOrPartitionException instead of 
> UnknownServerException if altering configs of non-existing topic
> -
>
> Key: KAFKA-10033
> URL: https://issues.apache.org/jira/browse/KAFKA-10033
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 2.4.0
>Reporter: Gregory Koshelev
>Assignee: Brian Byrne
>Priority: Major
>
> Currently, altering configs of non-existing topic leads to 
> {{UnknownServerException}}:
> {code}
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.UnknownServerException: Topic "kgn_test" does 
> not exist.
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)
>   at 
> ru.kontur.vostok.hercules.stream.manager.kafka.KafkaManager.changeTtl(KafkaManager.java:130)
>   ... 10 common frames omitted
> Caused by: org.apache.kafka.common.errors.UnknownServerException: Topic 
> "kgn_test" does not exist.
> {code}
> The output above is produced due to {{AdminZkClient.validateTopicConfig}} 
> method:
> {code}
>   def validateTopicConfig(topic: String, configs: Properties): Unit = {
> Topic.validate(topic)
> if (!zkClient.topicExists(topic))
>   throw new AdminOperationException("Topic \"%s\" does not 
> exist.".format(topic))
> // remove the topic overrides
> LogConfig.validate(configs)
>   }
> {code}
> {{UnknownServerException}} is common exception but in this case cause is 
> pretty clear. So this can be fixed easily by using 
> {{UnknownTopicOrPartitionException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9980) Text encoding bug prevents correctly setting client quotas for default entities

2020-05-21 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113260#comment-17113260
 ] 

Brian Byrne commented on KAFKA-9980:


Updated patch here: https://github.com/apache/kafka/pull/8658

> Text encoding bug prevents correctly setting client quotas for default 
> entities
> ---
>
> Key: KAFKA-9980
> URL: https://issues.apache.org/jira/browse/KAFKA-9980
> Project: Kafka
>  Issue Type: Bug
>Reporter: Cheng Tan
>Assignee: Brian Byrne
>Priority: Major
>
> quota_tests.py is failing. Specifically for this test:
> {quote}
>  [INFO:2020-05-11 19:22:47,493]: RunnerClient: Loading test \{'directory': 
> '/opt/kafka-dev/tests/kafkatest/tests/client', 'file_name': 'quota_test.py', 
> 'method_name': 'test_quota', 'cls_name': 'QuotaTest', 'injected_args': 
> {'quota_type': 'client-id', 'override_quota': False}}
> {quote}
>  
> I log into the docker container and do
>  
> {quote}
>  /opt/kafka-dev/bin/kafka-configs.sh --bootstrap-server ducker03:9093 
> --describe --entity-type clients --command-config 
> /opt/kafka-dev/bin/hi.properties
> {quote}
>  
>  and the command return
>  
> {quote}Configs for the default client-id are consumer_byte_rate=200.0, 
> producer_byte_rate=250.0
>  Configs for client-id 'overridden_id' are consumer_byte_rate=1.0E9, 
> producer_byte_rate=1.0E9
>  Seems like the config is properly but the quota is not effective
>   
> {quote}
>  For investigation, I added a logging at 
> {quote}{{AdminZKClient.changeConfigs()}}
> {quote}
>  
>  
> {quote}def changeConfigs(entityType: String, entityName: String, configs: 
> Properties): Unit =
> {
>         warn(s"entityType = $entityType entityName = $entityName configs = 
> $configs") ...
> }
> {quote}
> And use --bootstrap-server and --zookeeper to --alter the default client 
> quota. I got
>  
> {quote}
>  Alter with --zookeeper:WARN entityType = clients entityName =  
> configs = \{producer_byte_rate=10, consumer_byte_rate=10} 
> (kafka.zk.AdminZkClient)
> {quote}
>  
>  and
>  
> {quote}
>  Alter with --bootstrap-server:WARN entityType = clients entityName = 
> %3Cdefault%3E configs = \{producer_byte_rate=10, 
> consumer_byte_rate=10} (kafka.zk.AdminZkClient)
> {quote}
>  
>  I guess the encoding difference might cause the issue. The encoding happens 
> in
>  
> {quote}
>  Sanitizer.sanitize()
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9980) Text encoding bug prevents correctly setting client quotas for default entities

2020-05-21 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne reassigned KAFKA-9980:
--

Assignee: Brian Byrne  (was: Cheng Tan)

> Text encoding bug prevents correctly setting client quotas for default 
> entities
> ---
>
> Key: KAFKA-9980
> URL: https://issues.apache.org/jira/browse/KAFKA-9980
> Project: Kafka
>  Issue Type: Bug
>Reporter: Cheng Tan
>Assignee: Brian Byrne
>Priority: Major
>
> quota_tests.py is failing. Specifically for this test:
> {quote}
>  [INFO:2020-05-11 19:22:47,493]: RunnerClient: Loading test \{'directory': 
> '/opt/kafka-dev/tests/kafkatest/tests/client', 'file_name': 'quota_test.py', 
> 'method_name': 'test_quota', 'cls_name': 'QuotaTest', 'injected_args': 
> {'quota_type': 'client-id', 'override_quota': False}}
> {quote}
>  
> I log into the docker container and do
>  
> {quote}
>  /opt/kafka-dev/bin/kafka-configs.sh --bootstrap-server ducker03:9093 
> --describe --entity-type clients --command-config 
> /opt/kafka-dev/bin/hi.properties
> {quote}
>  
>  and the command return
>  
> {quote}Configs for the default client-id are consumer_byte_rate=200.0, 
> producer_byte_rate=250.0
>  Configs for client-id 'overridden_id' are consumer_byte_rate=1.0E9, 
> producer_byte_rate=1.0E9
>  Seems like the config is properly but the quota is not effective
>   
> {quote}
>  For investigation, I added a logging at 
> {quote}{{AdminZKClient.changeConfigs()}}
> {quote}
>  
>  
> {quote}def changeConfigs(entityType: String, entityName: String, configs: 
> Properties): Unit =
> {
>         warn(s"entityType = $entityType entityName = $entityName configs = 
> $configs") ...
> }
> {quote}
> And use --bootstrap-server and --zookeeper to --alter the default client 
> quota. I got
>  
> {quote}
>  Alter with --zookeeper:WARN entityType = clients entityName =  
> configs = \{producer_byte_rate=10, consumer_byte_rate=10} 
> (kafka.zk.AdminZkClient)
> {quote}
>  
>  and
>  
> {quote}
>  Alter with --bootstrap-server:WARN entityType = clients entityName = 
> %3Cdefault%3E configs = \{producer_byte_rate=10, 
> consumer_byte_rate=10} (kafka.zk.AdminZkClient)
> {quote}
>  
>  I guess the encoding difference might cause the issue. The encoding happens 
> in
>  
> {quote}
>  Sanitizer.sanitize()
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9954) Config command didn't validate the unsupported user config change

2020-05-05 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1702#comment-1702
 ] 

Brian Byrne commented on KAFKA-9954:


Hi Cheng - what error are you see when trying to modify the "users" quotas? 
This should be supported, and if not, there's likely another bug that's also 
affecting "clients".

> Config command didn't validate the unsupported user config change
> -
>
> Key: KAFKA-9954
> URL: https://issues.apache.org/jira/browse/KAFKA-9954
> Project: Kafka
>  Issue Type: Bug
>Reporter: Cheng Tan
>Assignee: Cheng Tan
>Priority: Major
>
> {quote}bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter 
> --add-config producer_byte_rate=4 --entity-type users --entity-default
> {quote}
>  
> will say that the alternation is complete. However, we don't support the 
> alternation yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9139) Dynamic broker config types aren't being discovered

2020-04-03 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9139.

Resolution: Duplicate

> Dynamic broker config types aren't being discovered
> ---
>
> Key: KAFKA-9139
> URL: https://issues.apache.org/jira/browse/KAFKA-9139
> Project: Kafka
>  Issue Type: Bug
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Major
>
> The broker's dynamic config definition types aren't being properly 
> discovered, and therefore they're being considered "sensitive" when returned 
> to the client. This needs to be resolved. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9713) Remove BufferExhausedException

2020-03-30 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9713.

Resolution: Duplicate

> Remove BufferExhausedException
> --
>
> Key: KAFKA-9713
> URL: https://issues.apache.org/jira/browse/KAFKA-9713
> Project: Kafka
>  Issue Type: Task
>  Components: producer 
>Reporter: Brian Byrne
>Priority: Trivial
>
> BufferExhaustedException was deprecated in 0.9.0.0, and the corresponding 
> block.on.buffer.full property has since been removed. The exception should 
> follow.
> {quote}Deprecations in 0.9.0.0
> The producer config block.on.buffer.full has been deprecated and will be 
> removed in future release. Currently its default value has been changed to 
> false. The KafkaProducer will no longer throw BufferExhaustedException but 
> instead will use max.block.ms value to block, after which it will throw a 
> TimeoutException. If block.on.buffer.full property is set to true explicitly, 
> it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms 
> will not be honoured{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9713) Remove BufferExhausedException

2020-03-12 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9713:
--

 Summary: Remove BufferExhausedException
 Key: KAFKA-9713
 URL: https://issues.apache.org/jira/browse/KAFKA-9713
 Project: Kafka
  Issue Type: Task
  Components: producer 
Reporter: Brian Byrne


BufferExhaustedException was deprecated in 0.9.0.0, and the corresponding 
block.on.buffer.full property has since been removed. The exception should 
follow.

{quote}Deprecations in 0.9.0.0

The producer config block.on.buffer.full has been deprecated and will be 
removed in future release. Currently its default value has been changed to 
false. The KafkaProducer will no longer throw BufferExhaustedException but 
instead will use max.block.ms value to block, after which it will throw a 
TimeoutException. If block.on.buffer.full property is set to true explicitly, 
it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms 
will not be honoured{quote}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8904) Reduce metadata lookups when producing to a large number of topics

2020-03-02 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-8904.

Fix Version/s: 2.5.0
 Reviewer: Rajini Sivaram
   Resolution: Fixed

> Reduce metadata lookups when producing to a large number of topics
> --
>
> Key: KAFKA-8904
> URL: https://issues.apache.org/jira/browse/KAFKA-8904
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, producer 
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Minor
> Fix For: 2.5.0
>
>
> Per [~lbradstreet]:
>  
> "The problem was that the producer starts with no knowledge of topic 
> metadata. So they start the producer up, and then they start sending messages 
> to any of the thousands of topics that exist. Each time a message is sent to 
> a new topic, it'll trigger a metadata request if the producer doesn't know 
> about it. These metadata requests are done in serial such that if you send 
> 2000 messages to 2000 topics, it will trigger 2000 new metadata requests.
>  
> Each successive metadata request will include every topic seen so far, so the 
> first metadata request will include 1 topic, the second will include 2 
> topics, etc.
>  
> An additional problem is that this can take a while, and metadata expiry (for 
> metadata that has not been recently used) is hard coded to 5 mins, so if this 
> the initial fetches take long enough you can end up evicting the metadata 
> before you send another message to a topic.
> So the approaches above are:
> 1. We can linger for a bit before making a metadata request, allow more sends 
> to go through, and then batch the metadata request for topics we we need in a 
> single metadata request.
> 2. We can allow pre-seeding the producer with metadata for a list of topics 
> you care about.
> I prefer 1 if we can make it work."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-8904) Reduce metadata lookups when producing to a large number of topics

2020-03-02 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne reassigned KAFKA-8904:
--

Assignee: Brian Byrne

> Reduce metadata lookups when producing to a large number of topics
> --
>
> Key: KAFKA-8904
> URL: https://issues.apache.org/jira/browse/KAFKA-8904
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, producer 
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Minor
>
> Per [~lbradstreet]:
>  
> "The problem was that the producer starts with no knowledge of topic 
> metadata. So they start the producer up, and then they start sending messages 
> to any of the thousands of topics that exist. Each time a message is sent to 
> a new topic, it'll trigger a metadata request if the producer doesn't know 
> about it. These metadata requests are done in serial such that if you send 
> 2000 messages to 2000 topics, it will trigger 2000 new metadata requests.
>  
> Each successive metadata request will include every topic seen so far, so the 
> first metadata request will include 1 topic, the second will include 2 
> topics, etc.
>  
> An additional problem is that this can take a while, and metadata expiry (for 
> metadata that has not been recently used) is hard coded to 5 mins, so if this 
> the initial fetches take long enough you can end up evicting the metadata 
> before you send another message to a topic.
> So the approaches above are:
> 1. We can linger for a bit before making a metadata request, allow more sends 
> to go through, and then batch the metadata request for topics we we need in a 
> single metadata request.
> 2. We can allow pre-seeding the producer with metadata for a list of topics 
> you care about.
> I prefer 1 if we can make it work."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8623) KafkaProducer possible deadlock when sending to different topics

2020-02-10 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-8623.

Fix Version/s: 2.3.0
   Resolution: Fixed

This appears to be due to an issue concerning the handling of consecutive 
metadata updates in clients, where the first update could effectively clear the 
request for the second because no version/instance which request was 
outstanding was maintained. This was fixed in PR 
[6621|https://github.com/apache/kafka/pull/6221] (see item 3), which is 
available in the 2.3.0 release.

> KafkaProducer possible deadlock when sending to different topics
> 
>
> Key: KAFKA-8623
> URL: https://issues.apache.org/jira/browse/KAFKA-8623
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.2.1
>Reporter: Alexander Bagiev
>Assignee: Kun Song
>Priority: Critical
> Fix For: 2.3.0
>
>
> Project with bug reproduction: [https://github.com/abagiev/kafka-producer-bug]
> It was found that sending two messages in two different topics in a row 
> results in hanging of KafkaProducer for 60s and the following exception:
> {noformat}
> org.springframework.kafka.core.KafkaProducerException: Failed to send; nested 
> exception is org.apache.kafka.common.errors.TimeoutException: Failed to 
> update metadata after 6 ms.
>   at 
> org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$0(KafkaTemplate.java:405)
>  ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:877)
>  ~[kafka-clients-2.0.1.jar:na]
>   at 
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:803) 
> ~[kafka-clients-2.0.1.jar:na]
>   at 
> org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:444)
>  ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
>   at 
> org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:381) 
> ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
>   at 
> org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:193) 
> ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE]
> ...
> {noformat}
> It looks like KafkaProducer requests two times for meta information for each 
> topic and hangs just before second request due to some deadlock. When 60s 
> pass TimeoutException is thrown and meta information is requested/received 
> immediately (but after exception has been already thrown).
> The issue in the example project is reproduced every time; and the use case 
> is trivial.
>  This is a critical bug for us.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9510) Quotas may resolve to incorrect value if user is empty

2020-02-05 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031080#comment-17031080
 ] 

Brian Byrne commented on KAFKA-9510:


Here's another one that may be more common.

/config/clients/, producer_byte_rate=1000
request: (user="test", client-id="")

Resolves to Long.MaxValue since quotaMetricsTags resolves to ("", clientId) -> 
("", "").


> Quotas may resolve to incorrect value if user is empty
> --
>
> Key: KAFKA-9510
> URL: https://issues.apache.org/jira/browse/KAFKA-9510
> Project: Kafka
>  Issue Type: Bug
>Reporter: Brian Byrne
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.6.0
>
>
> This may be a pretty rare/uncommon case that I encountered during testing 
> regarding an empty user. [~rsivaram] please let me know if this is a valid 
> bug and whether it's something that's needs further examination.
> Let's say two quota configurations are populated:
> /config/users/ {producer_byte_rate=500}
> /config/clients/ {producer_byte_rate=1000}
> And let's say a produce request with {user="", client-id="test"} enters the 
> system. When calling ClientQuotaManager::quota(), the metrics tags that are 
> fetched via ClientQuotaCallback::quotaMetricTags() will map to the config 
> entry for /config/users/, which is (sanitizedUser, ""), where 
> substituting gets ("", "").
> Then, when looking up the quota in ClientQuotaCallback::quotaLimit(), both 
> tags are the empty string, which resolves to null, which turns into 
> Long.MaxValue for the result. So where the client may have expected 500 (or 
> 1000?), it's instead unbounded.
> Is it valid for a request to ever contain an empty string for the user? If 
> so, then a fix will be needed, otherwise if not, we should safeguard against 
> this from happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9510) Quotas may resolve to incorrect value if user is empty

2020-02-05 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9510:
--

 Summary: Quotas may resolve to incorrect value if user is empty
 Key: KAFKA-9510
 URL: https://issues.apache.org/jira/browse/KAFKA-9510
 Project: Kafka
  Issue Type: Bug
Reporter: Brian Byrne
Assignee: Rajini Sivaram
 Fix For: 2.6.0


This may be a pretty rare/uncommon case that I encountered during testing 
regarding an empty user. [~rsivaram] please let me know if this is a valid bug 
and whether it's something that's needs further examination.

Let's say two quota configurations are populated:
/config/users/ {producer_byte_rate=500}
/config/clients/ {producer_byte_rate=1000}

And let's say a produce request with {user="", client-id="test"} enters the 
system. When calling ClientQuotaManager::quota(), the metrics tags that are 
fetched via ClientQuotaCallback::quotaMetricTags() will map to the config entry 
for /config/users/, which is (sanitizedUser, ""), where substituting 
gets ("", "").

Then, when looking up the quota in ClientQuotaCallback::quotaLimit(), both tags 
are the empty string, which resolves to null, which turns into Long.MaxValue 
for the result. So where the client may have expected 500 (or 1000?), it's 
instead unbounded.

Is it valid for a request to ever contain an empty string for the user? If so, 
then a fix will be needed, otherwise if not, we should safeguard against this 
from happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9474) Kafka RPC protocol should support type 'double'

2020-01-24 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9474:
--

 Summary: Kafka RPC protocol should support type 'double'
 Key: KAFKA-9474
 URL: https://issues.apache.org/jira/browse/KAFKA-9474
 Project: Kafka
  Issue Type: Improvement
Reporter: Brian Byrne
Assignee: Brian Byrne


Should be fairly straightforward. Useful for KIP-546.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9082) Move ConfigCommand to use KafkaAdminClient APIs

2020-01-22 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9082.

Resolution: Duplicate

The outstanding work to be completed is now identical to KAFKA-7740. Marking as 
duplicate.

> Move ConfigCommand to use KafkaAdminClient APIs
> ---
>
> Key: KAFKA-9082
> URL: https://issues.apache.org/jira/browse/KAFKA-9082
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Critical
> Fix For: 2.5.0
>
>
> The ConfigCommand currently only supports a subset of commands when 
> interacting with the KafkaAdminClient (as opposed to ZooKeeper directly). It 
> needs to be brought up to parity for KIP-500 work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9082) Move ConfigCommand to use KafkaAdminClient APIs

2020-01-22 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne updated KAFKA-9082:
---
Fix Version/s: 2.5.0

> Move ConfigCommand to use KafkaAdminClient APIs
> ---
>
> Key: KAFKA-9082
> URL: https://issues.apache.org/jira/browse/KAFKA-9082
> Project: Kafka
>  Issue Type: Sub-task
>  Components: admin
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Critical
> Fix For: 2.5.0
>
>
> The ConfigCommand currently only supports a subset of commands when 
> interacting with the KafkaAdminClient (as opposed to ZooKeeper directly). It 
> needs to be brought up to parity for KIP-500 work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-7740) Kafka Admin Client should be able to manage user/client configurations for users and clients

2020-01-22 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne updated KAFKA-7740:
---
Fix Version/s: 2.5.0

> Kafka Admin Client should be able to manage user/client configurations for 
> users and clients
> 
>
> Key: KAFKA-7740
> URL: https://issues.apache.org/jira/browse/KAFKA-7740
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.1.0
> Environment: linux
>Reporter: Yaodong Yang
>Assignee: Brian Byrne
>Priority: Major
>  Labels: features
> Fix For: 2.5.0
>
>
> Right now, Kafka Admin Client only allow users to change the configuration of 
> brokers and topics. There are some use cases that users want to setOrUpdate 
> quota configurations for users and clients through Kafka Admin Client. 
> Without this new capability, users have to manually talk to zookeeper for 
> this, which will pose other challenges for customers.
> Considering we have already have the framework for the much complex brokers 
> and topic configuration changes, it seems straightforward to add the support 
> for the alterConfig and describeConfig for users and clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7740) Kafka Admin Client should be able to manage user/client configurations for users and clients

2020-01-22 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021612#comment-17021612
 ] 

Brian Byrne commented on KAFKA-7740:


This will be resolved as a function of KIP-546, which is planned for the 2.5.0 
release. Adjust ticket to reflect this.

> Kafka Admin Client should be able to manage user/client configurations for 
> users and clients
> 
>
> Key: KAFKA-7740
> URL: https://issues.apache.org/jira/browse/KAFKA-7740
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.1.0
> Environment: linux
>Reporter: Yaodong Yang
>Assignee: Brian Byrne
>Priority: Major
>  Labels: features
> Fix For: 2.5.0
>
>
> Right now, Kafka Admin Client only allow users to change the configuration of 
> brokers and topics. There are some use cases that users want to setOrUpdate 
> quota configurations for users and clients through Kafka Admin Client. 
> Without this new capability, users have to manually talk to zookeeper for 
> this, which will pose other challenges for customers.
> Considering we have already have the framework for the much complex brokers 
> and topic configuration changes, it seems straightforward to add the support 
> for the alterConfig and describeConfig for users and clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9449) Producer's BufferPool may block the producer from closing.

2020-01-17 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9449:
--

 Summary: Producer's BufferPool may block the producer from closing.
 Key: KAFKA-9449
 URL: https://issues.apache.org/jira/browse/KAFKA-9449
 Project: Kafka
  Issue Type: Bug
Reporter: Brian Byrne
Assignee: Brian Byrne


The producer's BufferPool may block allocations if its memory limit has hit 
capacity. If the producer is closed, it's possible for the allocation waiters 
to wait for max.block.ms if progress cannot be made, even when force-closed 
(immediate), which can cause indefinite blocking if max.block.ms is 
particularly high.

The BufferPool should be made close-able, which should immediate wake up any 
waiters that are pending allocations and throw a "producer is closing" 
exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2020-01-17 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne updated KAFKA-8532:
---
Priority: Blocker  (was: Major)

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: js.log, js0.log, js1.log, js2.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005d1be5a00> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeperClient.scala:157)
>  at 
> kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1596)
>  at 
> 

[jira] [Updated] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2020-01-17 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne updated KAFKA-8532:
---
Priority: Major  (was: Blocker)

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Major
> Attachments: js.log, js0.log, js1.log, js2.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005d1be5a00> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeperClient.scala:157)
>  at 
> kafka.zk.KafkaZkClient.retryRequestsUntilConnected(KafkaZkClient.scala:1596)
>  at 
> 

[jira] [Resolved] (KAFKA-9395) Improve Kafka scheduler's periodic maybeShrinkIsr()

2020-01-14 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9395.

  Assignee: Rajini Sivaram  (was: Brian Byrne)
Resolution: Done

> Improve Kafka scheduler's periodic maybeShrinkIsr()
> ---
>
> Key: KAFKA-9395
> URL: https://issues.apache.org/jira/browse/KAFKA-9395
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Brian Byrne
>Assignee: Rajini Sivaram
>Priority: Major
>
> The ReplicaManager schedules a periodic call to maybeShrinkIsr() with the 
> KafkaScheduler for a period of replica.lag.time.max.ms / 2. While 
> replica.lag.time.max.ms defaults to 30s, my setup was 45s, which means 
> maybeShrinkIsr() was being called every 22.5 seconds. Normally this is not a 
> problem.
> Fetch/produce requests hold a partition's leaderIsrUpdateLock in reader mode 
> while they are running. When a partition is requested to check whether it 
> should shrink its ISR, it acquires a write lock. So there's potential for 
> contention here, and if the fetch/produce requests are long running, they may 
> block maybeShrinkIsr() for hundreds of ms.
> This becomes a problem due to the way the scheduler runnable is set up: it 
> calls maybeShrinkIsr() for partition per single scheduler invocation. If 
> there's a lot of partitions, this could take many seconds, even minutes. 
> However, the runnable is scheduled via 
> ScheduledThreadPoolExecutor#scheduleAtFixedRate, which means if it exceeds 
> its period, it's immediately scheduled to run again. So it backs up enough 
> that the scheduler is always executing this function.
> This may cause partitions to periodically check their ISR a lot less 
> frequently than intended. This also contributes a huge source of contention 
> for cases where the produce/fetch requests are long-running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-7740) Kafka Admin Client should be able to manage user/client configurations for users and clients

2020-01-13 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne reassigned KAFKA-7740:
--

Assignee: Brian Byrne

> Kafka Admin Client should be able to manage user/client configurations for 
> users and clients
> 
>
> Key: KAFKA-7740
> URL: https://issues.apache.org/jira/browse/KAFKA-7740
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.1.0
> Environment: linux
>Reporter: Yaodong Yang
>Assignee: Brian Byrne
>Priority: Major
>  Labels: features
>
> Right now, Kafka Admin Client only allow users to change the configuration of 
> brokers and topics. There are some use cases that users want to setOrUpdate 
> quota configurations for users and clients through Kafka Admin Client. 
> Without this new capability, users have to manually talk to zookeeper for 
> this, which will pose other challenges for customers.
> Considering we have already have the framework for the much complex brokers 
> and topic configuration changes, it seems straightforward to add the support 
> for the alterConfig and describeConfig for users and clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9395) Improve Kafka scheduler's periodic maybeShrinkIsr()

2020-01-09 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9395:
--

 Summary: Improve Kafka scheduler's periodic maybeShrinkIsr()
 Key: KAFKA-9395
 URL: https://issues.apache.org/jira/browse/KAFKA-9395
 Project: Kafka
  Issue Type: Improvement
Reporter: Brian Byrne
Assignee: Brian Byrne


The ReplicaManager schedules a periodic call to maybeShrinkIsr() with the 
KafkaScheduler for a period of replica.lag.time.max.ms / 2. While 
replica.lag.time.max.ms defaults to 30s, my setup was 45s, which means 
maybeShrinkIsr() was being called every 22.5 seconds. Normally this is not a 
problem.

Fetch/produce requests hold a partition's leaderIsrUpdateLock in reader mode 
while they are running. When a partition is requested to check whether it 
should shrink its ISR, it acquires a write lock. So there's potential for 
contention here, and if the fetch/produce requests are long running, they may 
block maybeShrinkIsr() for hundreds of ms.

This becomes a problem due to the way the scheduler runnable is set up: it 
calls maybeShrinkIsr() for partition per single scheduler invocation. If 
there's a lot of partitions, this could take many seconds, even minutes. 
However, the runnable is scheduled via 
ScheduledThreadPoolExecutor#scheduleAtFixedRate, which means if it exceeds its 
period, it's immediately scheduled to run again. So it backs up enough that the 
scheduler is always executing this function.

This may cause partitions to periodically check their ISR a lot less frequently 
than intended. This also contributes a huge source of contention for cases 
where the produce/fetch requests are long-running.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-9372) Add producer config to make topicExpiry configurable

2020-01-07 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne reassigned KAFKA-9372:
--

Assignee: Brian Byrne

> Add producer config to make topicExpiry configurable
> 
>
> Key: KAFKA-9372
> URL: https://issues.apache.org/jira/browse/KAFKA-9372
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Jiao Zhang
>Assignee: Brian Byrne
>Priority: Minor
>
> Sometimes we got error "org.apache.kafka.common.errors.TimeoutException: 
> Failed to update metadata after 1000 ms" on producer side. We did the 
> investigation and found
>  # our producer produced messages in really low rate, the interval is more 
> than 10 minutes
>  # by default, producer would expire topics after TOPIC_EXPIRY_MS, after 
> topic expired if no data produce before next metadata update (automatically 
> triggered by metadata.max.age.ms) partitions entry for the topic would 
> disappear from the Metadata cache As a result, almost for every time's 
> produce, producer need fetch metadata which could possibly end with timeout.
> To solve this, we propose to add a new config metadata.topic.expiry for 
> producer to make topicExpiry configurable. Topic expiry is good only when 
> producer is long-lived and is used for producing variable counts of topics. 
> But in the case that producers are bounded to single or few fixed topics, 
> there is no need to expire topics at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9372) Add producer config to make topicExpiry configurable

2020-01-07 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009933#comment-17009933
 ] 

Brian Byrne commented on KAFKA-9372:


Hi [~Jiao-zhang] - we actually have a KIP in progress to add just this in the 
form of metadata.eviction.period.ms:
  
https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics

I'll take ownership of the ticket and update it once implemented. Thanks!

> Add producer config to make topicExpiry configurable
> 
>
> Key: KAFKA-9372
> URL: https://issues.apache.org/jira/browse/KAFKA-9372
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 1.1.0
>Reporter: Jiao Zhang
>Priority: Minor
>
> Sometimes we got error "org.apache.kafka.common.errors.TimeoutException: 
> Failed to update metadata after 1000 ms" on producer side. We did the 
> investigation and found
>  # our producer produced messages in really low rate, the interval is more 
> than 10 minutes
>  # by default, producer would expire topics after TOPIC_EXPIRY_MS, after 
> topic expired if no data produce before next metadata update (automatically 
> triggered by metadata.max.age.ms) partitions entry for the topic would 
> disappear from the Metadata cache As a result, almost for every time's 
> produce, producer need fetch metadata which could possibly end with timeout.
> To solve this, we propose to add a new config metadata.topic.expiry for 
> producer to make topicExpiry configurable. Topic expiry is good only when 
> producer is long-lived and is used for producing variable counts of topics. 
> But in the case that producers are bounded to single or few fixed topics, 
> there is no need to expire topics at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9164) Don't evict active topics' metadata from the producer's cache

2019-12-02 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne resolved KAFKA-9164.

Resolution: Invalid

Marking issue invalid since the producer logic was actually handling this 
correctly.

> Don't evict active topics' metadata from the producer's cache
> -
>
> Key: KAFKA-9164
> URL: https://issues.apache.org/jira/browse/KAFKA-9164
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Reporter: Brian Byrne
>Assignee: Brian Byrne
>Priority: Minor
>
> The producer's metadata currently marks a topic as "not needing to be 
> retained" if it has been 5 minutes since it was first considered, regardless 
> of whether records were being actively produced for the topic. This shouldn't 
> happen and should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9164) Don't evict active topics' metadata from the producer's cache

2019-11-08 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9164:
--

 Summary: Don't evict active topics' metadata from the producer's 
cache
 Key: KAFKA-9164
 URL: https://issues.apache.org/jira/browse/KAFKA-9164
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Reporter: Brian Byrne
Assignee: Brian Byrne


The producer's metadata currently marks a topic as "not needing to be retained" 
if it has been 5 minutes since it was first considered, regardless of whether 
records were being actively produced for the topic. This shouldn't happen and 
should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9139) Dynamic broker config types aren't being discovered

2019-11-04 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9139:
--

 Summary: Dynamic broker config types aren't being discovered
 Key: KAFKA-9139
 URL: https://issues.apache.org/jira/browse/KAFKA-9139
 Project: Kafka
  Issue Type: Bug
Reporter: Brian Byrne
Assignee: Brian Byrne


The broker's dynamic config definition types aren't being properly discovered, 
and therefore they're being considered "sensitive" when returned to the client. 
This needs to be resolved. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9082) Move ConfigCommand to use KafkaAdminClient APIs

2019-10-22 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-9082:
--

 Summary: Move ConfigCommand to use KafkaAdminClient APIs
 Key: KAFKA-9082
 URL: https://issues.apache.org/jira/browse/KAFKA-9082
 Project: Kafka
  Issue Type: Improvement
  Components: admin
Reporter: Brian Byrne


The ConfigCommand currently only supports a subset of commands when interacting 
with the KafkaAdminClient (as opposed to ZooKeeper directly). It needs to be 
brought up to parity for KIP-500 work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-8904) Reduce metadata lookups when producing to a large number of topics

2019-09-12 Thread Brian Byrne (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Byrne updated KAFKA-8904:
---
Summary: Reduce metadata lookups when producing to a large number of topics 
 (was: Reduce metadata lookups when producting to a large number of topics)

> Reduce metadata lookups when producing to a large number of topics
> --
>
> Key: KAFKA-8904
> URL: https://issues.apache.org/jira/browse/KAFKA-8904
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller, producer 
>Reporter: Brian Byrne
>Priority: Minor
>
> Per [~lbradstreet]:
>  
> "The problem was that the producer starts with no knowledge of topic 
> metadata. So they start the producer up, and then they start sending messages 
> to any of the thousands of topics that exist. Each time a message is sent to 
> a new topic, it'll trigger a metadata request if the producer doesn't know 
> about it. These metadata requests are done in serial such that if you send 
> 2000 messages to 2000 topics, it will trigger 2000 new metadata requests.
>  
> Each successive metadata request will include every topic seen so far, so the 
> first metadata request will include 1 topic, the second will include 2 
> topics, etc.
>  
> An additional problem is that this can take a while, and metadata expiry (for 
> metadata that has not been recently used) is hard coded to 5 mins, so if this 
> the initial fetches take long enough you can end up evicting the metadata 
> before you send another message to a topic.
> So the approaches above are:
> 1. We can linger for a bit before making a metadata request, allow more sends 
> to go through, and then batch the metadata request for topics we we need in a 
> single metadata request.
> 2. We can allow pre-seeding the producer with metadata for a list of topics 
> you care about.
> I prefer 1 if we can make it work."



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8904) Reduce metadata lookups when producting to a large number of topics

2019-09-12 Thread Brian Byrne (Jira)
Brian Byrne created KAFKA-8904:
--

 Summary: Reduce metadata lookups when producting to a large number 
of topics
 Key: KAFKA-8904
 URL: https://issues.apache.org/jira/browse/KAFKA-8904
 Project: Kafka
  Issue Type: Improvement
  Components: controller, producer 
Reporter: Brian Byrne


Per [~lbradstreet]:
 
"The problem was that the producer starts with no knowledge of topic metadata. 
So they start the producer up, and then they start sending messages to any of 
the thousands of topics that exist. Each time a message is sent to a new topic, 
it'll trigger a metadata request if the producer doesn't know about it. These 
metadata requests are done in serial such that if you send 2000 messages to 
2000 topics, it will trigger 2000 new metadata requests.
 
Each successive metadata request will include every topic seen so far, so the 
first metadata request will include 1 topic, the second will include 2 topics, 
etc.
 
An additional problem is that this can take a while, and metadata expiry (for 
metadata that has not been recently used) is hard coded to 5 mins, so if this 
the initial fetches take long enough you can end up evicting the metadata 
before you send another message to a topic.

So the approaches above are:
1. We can linger for a bit before making a metadata request, allow more sends 
to go through, and then batch the metadata request for topics we we need in a 
single metadata request.
2. We can allow pre-seeding the producer with metadata for a list of topics you 
care about.

I prefer 1 if we can make it work."



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (KAFKA-6098) Delete and Re-create topic operation could result in race condition

2019-09-12 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928844#comment-16928844
 ] 

Brian Byrne edited comment on KAFKA-6098 at 9/12/19 7:45 PM:
-

I've been investigating this issue and have collected some thoughts on it. 
Since I'm relatively new to Kafka, I'll be verbose in my explanation so that my 
understanding may be validated/corrected.

The successful client criteria for deleting a topic is that the server persists 
the intent to delete the topic via creating ZK node 
/admin/delete_topics/. At this time, in-memory data structures will be 
modified to reflect the ongoing destruction of the topic, where consequently 
the topic's ZK node will be removed, followed by the deletion intent node*. The 
purpose for the operation to be performed asynchronously is that a topic may be 
ineligible for deletion for an indefinite amount of time during partition 
reassignment or broker instability.

It appears topic creation is prone to a small race in its ZK update sequence: 
it's required that the deletion intent node is removed after the topic's node 
for obvious recovery consistency reasons, however this also means there's a 
window where the deletion intent exists but the topic doesn't. In this case, a 
racing topic recreation is prone to some unexpected and undesirable behavior as 
the former may still be undergoing deletion (note topic creation doesn't check 
for a deletion intent).

The 'list topics' request uses a different source of truth than the creation 
path, where the topics are gathered by looking at an in-memory view of 
outstanding partition states. The partitions may be removed while the deletion 
is still outstanding, hence why the ZK node may still exist on creation, as 
[~guozhang] noted. 

A possible fix would be to have 'list topics' return a more conservative set of 
topics that are undergoing deletion. This might require some changes to how 
metadata snapshots are handled which seems a bit excessive for resolving this 
issue, although I'm not familiar with this component.

The "easy-fix" solution would have the create topic path check the metadata 
cache for the topic's existence, where if it doesn't exist but the topic's 
deletion intent does, then a transient error is returned that asks the client 
to backoff+retry. This ensures that all possible state for the previous topic 
has been eliminated before the new one is created. The only downside is that 
there's a window where no partitions for the topic exists (i.e. doesn't appear 
in list topics), but the topic deletion cannot be completed, which should be 
relatively small and likely due to ZK inaccessibility, which would prevent the 
creation from completing anyway.

Does this sound reasonable?

 

[*] There's actually a deletion of the topic's configuration in-between, which 
may be missed in this case and may be Peter's issue: 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1621-L1627]

 

 


was (Author: bbyrne):
I've been investigating this issue and have collected some thoughts on it. 
Since I'm relatively new to Kafka, I'll be verbose in my explanation so that my 
understanding may be validated/corrected.

The successful client criteria for deleting a topic is that the server persists 
the intent to delete the topic via creating ZK node 
/admin/delete_topics/. At this time, in-memory data structures will be 
modified to reflect the ongoing destruction of the topic, where consequently 
the topic's ZK node will be removed, followed by the deletion intent node*. The 
purpose for the operation to be performed asynchronously is that a topic may be 
ineligible for deletion for an indefinite amount of time during partition 
reassignment or broker instability.

Topic listing/creating appear to be at odds with each other, further 
complicated by the race-prone ZK update sequence: it's required that the 
deletion intent node is removed after the topic's node for obvious recovery 
consistency reasons, however this also means there's a window where the 
deletion intent exists but the topic doesn't. In this case, a racing topic 
recreation is prone to some unexpected and undesirable behavior as the former 
may still be undergoing deletion (note topic creation doesn't check for a 
deletion intent).

The 'list topics' request uses a different source of truth than the creation 
path, where the topics are gathered by looking at a topic's outstanding 
partitions' states. The partitions may be removed while the deletion is still 
outstanding, hence why the ZK node may still exist on creation, as [~guozhang] 
noted. 

A possible fix would be to have 'list topics' return a more conservative set of 
topics that are undergoing deletion. This might require some changes to how 
metadata snapshots are handled which seems a bit excessive for resolving this 
issue, 

[jira] [Commented] (KAFKA-6098) Delete and Re-create topic operation could result in race condition

2019-09-12 Thread Brian Byrne (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928844#comment-16928844
 ] 

Brian Byrne commented on KAFKA-6098:


I've been investigating this issue and have collected some thoughts on it. 
Since I'm relatively new to Kafka, I'll be verbose in my explanation so that my 
understanding may be validated/corrected.

The successful client criteria for deleting a topic is that the server persists 
the intent to delete the topic via creating ZK node 
/admin/delete_topics/. At this time, in-memory data structures will be 
modified to reflect the ongoing destruction of the topic, where consequently 
the topic's ZK node will be removed, followed by the deletion intent node*. The 
purpose for the operation to be performed asynchronously is that a topic may be 
ineligible for deletion for an indefinite amount of time during partition 
reassignment or broker instability.

Topic listing/creating appear to be at odds with each other, further 
complicated by the race-prone ZK update sequence: it's required that the 
deletion intent node is removed after the topic's node for obvious recovery 
consistency reasons, however this also means there's a window where the 
deletion intent exists but the topic doesn't. In this case, a racing topic 
recreation is prone to some unexpected and undesirable behavior as the former 
may still be undergoing deletion (note topic creation doesn't check for a 
deletion intent).

The 'list topics' request uses a different source of truth than the creation 
path, where the topics are gathered by looking at a topic's outstanding 
partitions' states. The partitions may be removed while the deletion is still 
outstanding, hence why the ZK node may still exist on creation, as [~guozhang] 
noted. 

A possible fix would be to have 'list topics' return a more conservative set of 
topics that are undergoing deletion. This might require some changes to how 
metadata snapshots are handled which seems a bit excessive for resolving this 
issue, although I'm not familiar with this component.

The "easy-fix" solution would have the create topic path check the metadata 
cache for the topic's existence, where if it doesn't exist but the topic's 
deletion intent does, then a transient error is returned that asks the client 
to backoff+retry. This ensures that all possible state for the previous topic 
has been eliminated before the new one is created. The only downside is that 
there's a window where no partitions for the topic exists (i.e. doesn't appear 
in list topics), but the topic deletion cannot be completed, which should be 
relatively small and likely due to ZK inaccessibility, which would prevent the 
creation from completing anyway.

Does this sound reasonable?

 

[*] There's actually a deletion of the topic's configuration in-between, which 
may be missed in this case, which may be Peter's issue: 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1621-L1627]

 

 

> Delete and Re-create topic operation could result in race condition
> ---
>
> Key: KAFKA-6098
> URL: https://issues.apache.org/jira/browse/KAFKA-6098
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: reliability
>
> Here is the following process to re-produce this issue:
> 1. Delete a topic using the delete topic request.
> 2. Confirm the topic is deleted using the list topics request.
> 3. Create the topic using the create topic request.
> In step 3) a race condition can happen that the response returns a 
> {{TOPIC_ALREADY_EXISTS}} error code, indicating the topic has already existed.
> The root cause of the above issue is in the {{TopicDeletionManager}} class:
> {code}
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
>  OfflinePartition)
> controller.partitionStateMachine.handleStateChanges(partitionsForDeletedTopic.toSeq,
>  NonExistentPartition)
> topicsToBeDeleted -= topic
> partitionsToBeDeleted.retain(_.topic != topic)
> kafkaControllerZkUtils.deleteTopicZNode(topic)
> kafkaControllerZkUtils.deleteTopicConfigs(Seq(topic))
> kafkaControllerZkUtils.deleteTopicDeletions(Seq(topic))
> controllerContext.removeTopic(topic)
> {code}
> I.e. it first update the broker's metadata cache through the ISR and metadata 
> update request, then delete the topic zk path, and then delete the 
> topic-deletion zk path. However, upon handling the create topic request, the 
> broker will simply try to write to the topic zk path directly. Hence there is 
> a race condition that between brokers update their metadata cache (hence list 
> topic request not returning this topic anymore) and zk path for the topic be 
> deleted (hence the create topic succeed).