[jira] [Resolved] (KAFKA-10300) fix flaky core/group_mode_transactions_test.py

2020-07-23 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10300.
-
Fix Version/s: 2.7.0
   Resolution: Fixed

merged to trunk

> fix flaky core/group_mode_transactions_test.py
> --
>
> Key: KAFKA-10300
> URL: https://issues.apache.org/jira/browse/KAFKA-10300
> Project: Kafka
>  Issue Type: Bug
>  Components: core, system tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 2.7.0
>
>
> {quote}
> test_id:    
> kafkatest.tests.core.group_mode_transactions_test.GroupModeTransactionsTest.test_transactions.failure_mode=hard_bounce.bounce_target=brokers
> status:     FAIL
> run time:   9 minutes 47.698 seconds
>  
>  
>     copier-0 - Failed to copy all messages in 240s.
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 134, in run
>     data = self.run_test()
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 192, in run_test
>     return self.test_context.function(self.test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 429, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/core/group_mode_transactions_test.py", 
> line 271, in test_transactions
>     num_messages_to_copy=self.num_seed_messages)
>   File 
> "/opt/kafka-dev/tests/kafkatest/tests/core/group_mode_transactions_test.py", 
> line 230, in copy_messages_transactionally
>     (copier.transactional_id, copier_timeout_sec))
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
> 41, in wait_until
>     raise TimeoutError(err_msg() if callable(err_msg) else err_msg)
> TimeoutError: copier-0 - Failed to copy all messages in 240s.
> {quote}
>  
> this issue is same to KAFKA-10274 so we can apply the same approach



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10274) Transaction system test uses inconsistent timeouts

2020-07-22 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10274.
-
Fix Version/s: 2.6.0
   Resolution: Fixed

Merged this to trunk and 2.6 branch.

> Transaction system test uses inconsistent timeouts
> --
>
> Key: KAFKA-10274
> URL: https://issues.apache.org/jira/browse/KAFKA-10274
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.6.0
>
>
> We've seen some failures in the transaction system test with errors like the 
> following:
> {code}
> copier-1 : Message copier didn't make enough progress in 30s. Current 
> progress: 0
> {code}
> Looking at the consumer logs, we see the following messages repeating over 
> and over:
> {code}
> [2020-07-14 06:50:21,466] DEBUG [Consumer 
> clientId=consumer-transactions-test-consumer-group-1, 
> groupId=transactions-test-consumer-group] Fetching committed offsets for 
> partitions: [input-topic-1] 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-07-14 06:50:21,468] DEBUG [Consumer 
> clientId=consumer-transactions-test-consumer-group-1, 
> groupId=transactions-test-consumer-group] Failed to fetch offset for 
> partition input-topic-1: There are unstable offsets that need to be cleared. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> {code}
> I think the problem is that the test implicitly depends on the transaction 
> timeout which has been configured to 40s even though it expects progress 
> after 30s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10257) system test kafkatest.tests.core.security_rolling_upgrade_test fails

2020-07-15 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10257.
-
Fix Version/s: 2.6.0
   Resolution: Fixed

Merged to 2.6 and trunk.

> system test kafkatest.tests.core.security_rolling_upgrade_test fails
> 
>
> Key: KAFKA-10257
> URL: https://issues.apache.org/jira/browse/KAFKA-10257
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Jun Rao
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 2.6.0
>
>
> The test failure was reported in 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2020-07-08--001.1594266883--chia7712--KAFKA-10235--a76224fff/report.html
> Saw the following error in the log.
> {code:java}
> [2020-07-09 00:56:37,575] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: Could not find a 'KafkaServer' or 
> 'internal.KafkaServer' entry in the JAAS configuration. System property 
> 'java.security.auth.login.config' is not set
> at 
> org.apache.kafka.common.security.JaasContext.defaultContext(JaasContext.java:133)
> at 
> org.apache.kafka.common.security.JaasContext.load(JaasContext.java:98)
> at 
> org.apache.kafka.common.security.JaasContext.loadServerContext(JaasContext.java:70)
> at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:131)
> at 
> org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:97)
> at kafka.network.Processor.(SocketServer.scala:780)
> at kafka.network.SocketServer.newProcessor(SocketServer.scala:406)
> at 
> kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:285)
> at 
> kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:284)
> at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:251)
> at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:248)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:920)
> at 
> kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:248)
> at kafka.network.SocketServer.startup(SocketServer.scala:122)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:297)
> at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
> at kafka.Kafka$.main(Kafka.scala:82)
> at kafka.Kafka.main(Kafka.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10257) system test kafkatest.tests.core.security_rolling_upgrade_test fails

2020-07-14 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157500#comment-17157500
 ] 

Jun Rao commented on KAFKA-10257:
-

[~chia7712] : Yes. Thanks for helping out.

> system test kafkatest.tests.core.security_rolling_upgrade_test fails
> 
>
> Key: KAFKA-10257
> URL: https://issues.apache.org/jira/browse/KAFKA-10257
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Jun Rao
>Priority: Major
>
> The test failure was reported in 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2020-07-08--001.1594266883--chia7712--KAFKA-10235--a76224fff/report.html
> Saw the following error in the log.
> {code:java}
> [2020-07-09 00:56:37,575] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: Could not find a 'KafkaServer' or 
> 'internal.KafkaServer' entry in the JAAS configuration. System property 
> 'java.security.auth.login.config' is not set
> at 
> org.apache.kafka.common.security.JaasContext.defaultContext(JaasContext.java:133)
> at 
> org.apache.kafka.common.security.JaasContext.load(JaasContext.java:98)
> at 
> org.apache.kafka.common.security.JaasContext.loadServerContext(JaasContext.java:70)
> at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:131)
> at 
> org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:97)
> at kafka.network.Processor.(SocketServer.scala:780)
> at kafka.network.SocketServer.newProcessor(SocketServer.scala:406)
> at 
> kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:285)
> at 
> kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:284)
> at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:251)
> at 
> kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:248)
> at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
> at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:920)
> at 
> kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:248)
> at kafka.network.SocketServer.startup(SocketServer.scala:122)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:297)
> at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
> at kafka.Kafka$.main(Kafka.scala:82)
> at kafka.Kafka.main(Kafka.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10002) Improve performances of StopReplicaRequest with large number of partitions to be deleted

2020-07-13 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10002.
-
Fix Version/s: 2.7.0
   Resolution: Fixed

merged the PR to trunk

> Improve performances of StopReplicaRequest with large number of partitions to 
> be deleted
> 
>
> Key: KAFKA-10002
> URL: https://issues.apache.org/jira/browse/KAFKA-10002
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 2.7.0
>
>
> I have noticed that StopReplicaRequests with partitions to be deleted are 
> extremely slow when there is more than 2000 partitions which leads to hitting 
> the request timeout in the controller. A request with 2000 partitions to be 
> deleted still works but performances degrades significantly with the number 
> increases. For examples, a request with 3000 partitions to be deletes takes 
> appox. 60 seconds to be processed.
> A CPU profile shows that most of the time is spent in checkpointing log start 
> offsets and recovery offsets. Almost 90% of the time is there. See attached. 
> When a partition is deleted, the replica manager calls 
> `ReplicaManager#asyncDelete` that checkpoints recovery offsets and log start 
> offsets. As the checkpoints are per data directory, the checkpointing is made 
> for all the partitions in the directory of the partition to be deleted. In 
> our case where we have only one data directory, if you deletes 1000 
> partitions, we end up checkpointing the same things 1000 times which is not 
> efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10235) Fix flaky transactions_test.py

2020-07-09 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10235.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

merged the PR to trunk.

> Fix flaky transactions_test.py
> --
>
> Key: KAFKA-10235
> URL: https://issues.apache.org/jira/browse/KAFKA-10235
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 3.0.0
>
>
> {code}
> =hard_bounce.bounce_target=clients.check_order=False.use_group_metadata=False:
>  FAIL: copier-1 : Message copier didn't make enough progress in 30s. Current 
> progress: 0
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 134, in run
> data = self.run_test()
>   File 
> "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", 
> line 192, in run_test
> return self.test_context.function(self.test)
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 
> 429, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
> 254, in test_transactions
> num_messages_to_copy=self.num_seed_messages, 
> use_group_metadata=use_group_metadata)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
> 195, in copy_messages_transactionally
> self.bounce_copiers(copiers, clean_shutdown)
>   File "/opt/kafka-dev/tests/kafkatest/tests/core/transactions_test.py", line 
> 120, in bounce_copiers
> % (copier.transactional_id, str(copier.progress_percent(
>   File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 
> 41, in wait_until
> raise TimeoutError(err_msg() if callable(err_msg) else err_msg)
> TimeoutError: copier-1 : Message copier didn't make enough progress in 30s. 
> Current progress: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10257) system test kafkatest.tests.core.security_rolling_upgrade_test fails

2020-07-09 Thread Jun Rao (Jira)
Jun Rao created KAFKA-10257:
---

 Summary: system test 
kafkatest.tests.core.security_rolling_upgrade_test fails
 Key: KAFKA-10257
 URL: https://issues.apache.org/jira/browse/KAFKA-10257
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jun Rao


The test failure was reported in 
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2020-07-08--001.1594266883--chia7712--KAFKA-10235--a76224fff/report.html

Saw the following error in the log.

{code:java}
[2020-07-09 00:56:37,575] ERROR [KafkaServer id=1] Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.IllegalArgumentException: Could not find a 'KafkaServer' or 
'internal.KafkaServer' entry in the JAAS configuration. System property 
'java.security.auth.login.config' is not set
at 
org.apache.kafka.common.security.JaasContext.defaultContext(JaasContext.java:133)
at 
org.apache.kafka.common.security.JaasContext.load(JaasContext.java:98)
at 
org.apache.kafka.common.security.JaasContext.loadServerContext(JaasContext.java:70)
at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:131)
at 
org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:97)
at kafka.network.Processor.(SocketServer.scala:780)
at kafka.network.SocketServer.newProcessor(SocketServer.scala:406)
at 
kafka.network.SocketServer.$anonfun$addDataPlaneProcessors$1(SocketServer.scala:285)
at 
kafka.network.SocketServer.addDataPlaneProcessors(SocketServer.scala:284)
at 
kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1(SocketServer.scala:251)
at 
kafka.network.SocketServer.$anonfun$createDataPlaneAcceptorsAndProcessors$1$adapted(SocketServer.scala:248)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
at scala.collection.AbstractIterable.foreach(Iterable.scala:920)
at 
kafka.network.SocketServer.createDataPlaneAcceptorsAndProcessors(SocketServer.scala:248)
at kafka.network.SocketServer.startup(SocketServer.scala:122)
at kafka.server.KafkaServer.startup(KafkaServer.scala:297)
at 
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:82)
at kafka.Kafka.main(Kafka.scala)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10225) Increase default zk session timeout for system tests

2020-07-08 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10225.
-
Fix Version/s: 2.7.0
   Resolution: Fixed

merged the PR to trunk

> Increase default zk session timeout for system tests
> 
>
> Key: KAFKA-10225
> URL: https://issues.apache.org/jira/browse/KAFKA-10225
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> I'm digging in the flaky system tests and then I noticed there are many flaky 
> caused by following check.
> {code}
> with node.account.monitor_log(KafkaService.STDOUT_STDERR_CAPTURE) as 
> monitor:
> node.account.ssh(cmd)
> # Kafka 1.0.0 and higher don't have a space between "Kafka" and 
> "Server"
> monitor.wait_until("Kafka\s*Server.*started", 
> timeout_sec=timeout_sec, backoff_sec=.25,
>err_msg="Kafka server didn't finish startup in 
> %d seconds" % timeout_sec)
> {code}
> And the error message in broker log is shown below.
> {quote}
> kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
>   at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:262)
>   at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:119)
>   at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1880)
>   at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:430)
>   at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:455)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:227)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
>   at kafka.Kafka$.main(Kafka.scala:82)
>   at kafka.Kafka.main(Kafka.scala)
> {quote}
> I'm surprised the default timeout of zk connection in system test is only 2 
> seconds as the default timeout in production is increased to 18s (see 
> https://github.com/apache/kafka/commit/4bde9bb3ccaf5571be76cb96ea051dadaeeaf5c7)
> {code}
> config_property.ZOOKEEPER_CONNECTION_TIMEOUT_MS: 2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9769) ReplicaManager Partition.makeFollower Increases LeaderEpoch when ZooKeeper disconnect occurs

2020-07-06 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9769.

Fix Version/s: 2.7.0
 Assignee: Andrew Choi
   Resolution: Fixed

merged the PR to trunk

> ReplicaManager Partition.makeFollower Increases LeaderEpoch when ZooKeeper 
> disconnect occurs
> 
>
> Key: KAFKA-9769
> URL: https://issues.apache.org/jira/browse/KAFKA-9769
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Reporter: Andrew Choi
>Assignee: Andrew Choi
>Priority: Minor
>  Labels: kafka, replica, replication
> Fix For: 2.7.0
>
>
> The ZooKeeper Session once expired and got disconnected and the broker 
> received the 1st LeaderAndIsr request simultaneously. As the broker was 
> processing the 1st LeaderAndIsr Request, the ZooKeeper session has not been 
> reestablished just yet.
> Within the makeFollowers method, _partition.getOrCreateReplica_ is called 
> before the fetcher begins. _partition.getOrCreateReplica_ needs to fetch 
> information from ZooKeeper but an exception is thrown when calling the 
> ZooKeeper client because the session is invalid, rendering the fetcher start 
> to be skipped.
>  
> In Partition class's getOrCreateReplica method calls AdminZkClient's 
> fetchEntityConfig(..) which throws an exception if the ZooKeeper session is 
> invalid. 
>  
> {code:java}
> val props = adminZkClient.fetchEntityConfig(ConfigType.Topic, topic){code}
>  
> When this occurs, the leader epoch should not have been incremented due to 
> ZooKeeper being invalid because once the second LeaderAndIsr request comes 
> in, the leader epoch could be the same between the brokers. 
> Few options I can think of for a fix. I think third route could be feasible:
> 1 - Make LeaderEpoch update and fetch update atomic.
> 2 - Wait until all individual partitions are successful without problems then 
> process fetch.
> 3 - Catch the ZooKeeper exception in the caller code block 
> (ReplicaManager.makeFollowers) and simply do not touch the remaining 
> partitions to ensure that the batch of successful partitions up to that point 
> are updated and processed (fetch).
> 4 - Or make LeaderAndIsr request never arrive at the broker in case of 
> ZooKeeper disconnect, then that would be safe because it is already possible 
> for some replicas to receive the LeaderAndIsr later than the others. However, 
> in that case, the code need to make sure the controller will retry.
>  
> {code:java}
>  else if (requestLeaderEpoch > currentLeaderEpoch) {
>  // If the leader epoch is valid record the epoch of the controller that made 
> the leadership decision. 
> // This is useful while updating the isr to maintain the decision maker 
> controller's epoch in the zookeeper path 
> if (stateInfo.basePartitionState.replicas.contains(localBrokerId))   
> partitionState.put(partition, stateInfo)
> else 
>  
> def getOrCreateReplica(replicaId: Int, isNew: Boolean = false): Replica = {   
>allReplicasMap.getAndMaybePut(replicaId, { 
> if (isReplicaLocal(replicaId)) {
> val adminZkClient = new AdminZkClient(zkClient) val props = 
> adminZkClient.fetchEntityConfig(ConfigType.Topic, topic)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2020-06-26 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146527#comment-17146527
 ] 

Jun Rao commented on KAFKA-9693:


[~joliveirinha]: Thanks for the analysis. To trigger the flush during log 
append, did you set log.flush.interval.messages to 1? The high remote time in 
produce could be caused by (1) the processing for fetch follower request on the 
leader is long or (2) the follower is not issuing fetch request quick enough. 
Since fetchFollower time is not high, it doesn't seem (1) is the cause. (2) can 
be caused by follower log append delay. How many partitions did you use in the 
test? A fetch follower request can include multiple partitions. So, it can add 
up. (2) can also be caused by replication throttling. The non-zero follower 
throttling time typically indicates that replication throttling is engaged.

> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, image-2020-06-23-12-24-46-548.png, 
> image-2020-06-23-12-24-58-788.png, image-2020-06-26-13-43-21-723.png, 
> image-2020-06-26-13-46-52-861.png, image-2020-06-26-14-06-01-505.png, 
> latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-06-25 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145752#comment-17145752
 ] 

Jun Rao commented on KAFKA-4084:


[~hai_lin] : (1) If we just need a tool to disallow a broker from taking over 
as the leader for some partitions, it seems the existing partitionReassignment 
tool can do that already. One just needs to move that broker away from the 
first replica in the assigned replica list. So, I am not sure if we need a 
separate tool proposed by KIP-491 for this. (2) For the common case when a 
broker is restarted, currently, we move the leader when some, but not all 
replicas are caught up. It could be a better behavior to wait until all 
replicas are caught up before changing the leaders. If this is the more desired 
behavior, maybe this can just be added in the auto leader balancing logic, 
instead of a separate tool.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10197) Elect preferred leader when completing a partition reassignment

2020-06-24 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144190#comment-17144190
 ] 

Jun Rao commented on KAFKA-10197:
-

[~bob-barrett]: Thanks for filing the issue. In 
reassignPartitionLeaderElection(), we do iterate the preferred replica first 
when selecting the leader.

{code:java}
  def reassignPartitionLeaderElection(reassignment: Seq[Int], isr: Seq[Int], 
liveReplicas: Set[Int]): Option[Int] = {
reassignment.find(id => liveReplicas.contains(id) && isr.contains(id))
  }
{code}

The issue is that in KafkaController.moveReassignedPartitionLeaderIfRequired(), 
we won't trigger a leader election if the leader is in the reassigned replicas, 
but not the preferred one. 


{code:java}
  private def moveReassignedPartitionLeaderIfRequired(topicPartition: 
TopicPartition,
  newAssignment: 
ReplicaAssignment): Unit = {
val reassignedReplicas = newAssignment.replicas
val currentLeader = 
controllerContext.partitionLeadershipInfo(topicPartition).get.leaderAndIsr.leader

if (!reassignedReplicas.contains(currentLeader)) {
  info(s"Leader $currentLeader for partition $topicPartition being 
reassigned, " +
s"is not in the new list of replicas 
${reassignedReplicas.mkString(",")}. Re-electing leader")
  // move the leader to one of the alive and caught up new replicas
  partitionStateMachine.handleStateChanges(Seq(topicPartition), 
OnlinePartition, Some(ReassignPartitionLeaderElectionStrategy))
} else if (controllerContext.isReplicaOnline(currentLeader, 
topicPartition)) {
  info(s"Leader $currentLeader for partition $topicPartition being 
reassigned, " +
s"is already in the new list of replicas 
${reassignedReplicas.mkString(",")} and is alive")
  // shrink replication factor and update the leader epoch in zookeeper to 
use on the next LeaderAndIsrRequest
  updateLeaderEpochAndSendRequest(topicPartition, newAssignment)
} else {
  info(s"Leader $currentLeader for partition $topicPartition being 
reassigned, " +
s"is already in the new list of replicas 
${reassignedReplicas.mkString(",")} but is dead")
  partitionStateMachine.handleStateChanges(Seq(topicPartition), 
OnlinePartition, Some(ReassignPartitionLeaderElectionStrategy))
}
  }
{code}





> Elect preferred leader when completing a partition reassignment
> ---
>
> Key: KAFKA-10197
> URL: https://issues.apache.org/jira/browse/KAFKA-10197
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Bob Barrett
>Priority: Major
>
> Currently, when completing a partition reassignment, we elect a leader from 
> the new replica assignment without requiring that the leader be the new 
> preferred leader. Instead, we just choose any in-sync replica:
> {code:java}
> private def leaderForReassign(partition: TopicPartition,
>   leaderAndIsr: LeaderAndIsr,
>   controllerContext: ControllerContext): 
> ElectionResult = { 
>   val targetReplicas = 
> controllerContext.partitionFullReplicaAssignment(partition).targetReplicaAssignment.replicas
>   val liveReplicas = targetReplicas.filter(replica => 
> controllerContext.isReplicaOnline(replica, partition))
>   val isr = leaderAndIsr.isr
>   val leaderOpt = 
> PartitionLeaderElectionAlgorithms.reassignPartitionLeaderElection(targetReplicas,
>  isr, liveReplicas.toSet)
>   val newLeaderAndIsrOpt = leaderOpt.map(leader => 
> leaderAndIsr.newLeader(leader, isUnclean = false))
>   ElectionResult(partition, newLeaderAndIsrOpt, targetReplicas)
> }
> {code}
> If auto preferred leader election is enabled, the preferred leader will 
> eventually be elected. However, it would make sense to choose the preferred 
> leader after the reassignment without waiting for the automatic trigger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2020-06-23 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143037#comment-17143037
 ] 

Jun Rao commented on KAFKA-9693:


[~joliveirinha]: Thanks for the info. I am not sure that we fully understand 
this yet. Earlier, [~paolomoriello] confirmed the latency increase is due to 
log.append. It would be useful for you to confirm that too (producer local time 
metric on the broker can serve as an approximate of log append time). If that's 
confirmed, I am wondering if we could use tools like co-pilot 
(https://www.redhat.com/en/blog/how-solve-performance-mysteries-performance-co-pilot-red-hat-enterprise-linux)
 to understand the behavior better.




> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, image-2020-06-23-12-24-46-548.png, 
> image-2020-06-23-12-24-58-788.png, latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time)
>  * the index is full
> In this case, the increase in latency happens on _append_ of a new message 
> set to the active segment of the log. This is a synchronous operation which 
> therefore blocks producers requests, causing the latency increase.
> To confirm this, I instrumented Kaf

[jira] [Commented] (KAFKA-10191) fix flaky StreamsOptimizedTest

2020-06-22 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142301#comment-17142301
 ] 

Jun Rao commented on KAFKA-10191:
-

This test failure occurred 
[http://testing.confluent.io/confluent-kafka-branch-builder-system-test-results/?prefix=2020-06-19--001.1592614513--chia7712--fix_8334_avoid_deadlock--142a6c4c0/]
 when testing [https://github.com/apache/kafka/pull/8657]. But the failure 
doesn't seem to be related to that PR since it fails locally too.

> fix flaky StreamsOptimizedTest
> --
>
> Key: KAFKA-10191
> URL: https://issues.apache.org/jira/browse/KAFKA-10191
> Project: Kafka
>  Issue Type: Test
>  Components: streams, unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
>
> {quote}Exception in thread 
> "StreamsOptimizedTest-53c7d3b1-12b2-4d02-90b1-15757dfd2735-StreamThread-1" 
> java.lang.IllegalStateException: Tried to lookup lag for unknown task 
> 2_0Exception in thread 
> "StreamsOptimizedTest-53c7d3b1-12b2-4d02-90b1-15757dfd2735-StreamThread-1" 
> java.lang.IllegalStateException: Tried to lookup lag for unknown task 2_0 at 
> org.apache.kafka.streams.processor.internals.assignment.ClientState.lagFor(ClientState.java:306)
>  at java.util.Comparator.lambda$comparingLong$6043328a$1(Comparator.java:511) 
> at java.util.Comparator.lambda$thenComparing$36697e65$1(Comparator.java:216) 
> at java.util.TreeMap.put(TreeMap.java:552) at 
> java.util.TreeSet.add(TreeSet.java:255) at 
> java.util.AbstractCollection.addAll(AbstractCollection.java:344) at 
> java.util.TreeSet.addAll(TreeSet.java:312) at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.getPreviousTasksByLag(StreamsPartitionAssignor.java:1250)
>  at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assignTasksToThreads(StreamsPartitionAssignor.java:1164)
>  at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.computeNewAssignment(StreamsPartitionAssignor.java:920)
>  at 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.assign(StreamsPartitionAssignor.java:391)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:583)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:689)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1400(AbstractCoordinator.java:111)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:602)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:575)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1132)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1107)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169)
>  at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:419)
>  at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:359)
>  at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:506)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1263)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1229) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1204) 
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:762)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:622)

[jira] [Resolved] (KAFKA-10027) Implement read path for feature versioning scheme

2020-06-11 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10027.
-
Fix Version/s: 2.7.0
 Assignee: Kowshik Prakasam
   Resolution: Fixed

merged the PR to trunk

> Implement read path for feature versioning scheme
> -
>
> Key: KAFKA-10027
> URL: https://issues.apache.org/jira/browse/KAFKA-10027
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Kowshik Prakasam
>Assignee: Kowshik Prakasam
>Priority: Major
> Fix For: 2.7.0
>
>
> Goal is to implement various classes and integration for the read path of the 
> feature versioning system 
> ([KIP-584|https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features]).
>  The ultimate plan is that the cluster-wide *finalized* features information 
> is going to be stored in ZK under the node {{/feature}}. The read path 
> implemented in this PR is centered around reading this *finalized* features 
> information from ZK, and, processing it inside the Broker.
>  
> Here is a summary of what's needed for this Jira (a lot of it is *new* 
> classes):
>  * A facility is provided in the broker to declare it's supported features, 
> and advertise it's supported features via it's own {{BrokerIdZNode}} under a 
> {{features}} key.
>  * A facility is provided in the broker to listen to and propagate 
> cluster-wide *finalized* feature changes from ZK.
>  * When new *finalized* features are read from ZK, feature incompatibilities 
> are detected by comparing against the broker's own supported features.
>  * {{ApiVersionsResponse}} is now served containing supported and finalized 
> feature information (using the newly added tagged fields).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10106) log time taken to handle LeaderAndIsr request

2020-06-08 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10106.
-
Fix Version/s: 2.7.0
 Assignee: NIKHIL
   Resolution: Fixed

Merged the PR to trunk.

> log time taken to handle LeaderAndIsr request 
> --
>
> Key: KAFKA-10106
> URL: https://issues.apache.org/jira/browse/KAFKA-10106
> Project: Kafka
>  Issue Type: Improvement
>Reporter: NIKHIL
>Assignee: NIKHIL
>Priority: Minor
> Fix For: 2.7.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ReplicaManager!becomeLeaderOrFollower handles the LeaderAndIsr request, 
> StateChangeLogger logs when this request is handled, however it can be useful 
> to log when this calls ends and record the time taken, can help 
> operationally. 
> Proposal is to ReplicaManager!becomeLeaderOrFollower start measuring the time 
> before the `replicaStateChangeLock` is acquired and log before the response 
> is returned. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10101) recovery point is advanced without flushing the data after recovery

2020-06-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126036#comment-17126036
 ] 

Jun Rao commented on KAFKA-10101:
-

If this is an issue, one way to fix that is to simply not advance recoveryPoint 
after recovery.

> recovery point is advanced without flushing the data after recovery
> ---
>
> Key: KAFKA-10101
> URL: https://issues.apache.org/jira/browse/KAFKA-10101
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.5.0
>Reporter: Jun Rao
>Priority: Major
>
> Currently, in Log.recoverLog(), we set recoveryPoint to logEndOffset after 
> recovering the log segment. However, we don't flush the log segments after 
> recovery. The potential issue is that if the broker has another hard failure, 
> segments may be corrupted on disk but won't be going through recovery on 
> another restart.
> This logic was introduced in KAFKA-5829 since 1.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10101) recovery point is advanced without flushing the data after recovery

2020-06-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126023#comment-17126023
 ] 

Jun Rao commented on KAFKA-10101:
-

[~ijuma] : Could you double check if this is an issue? Thanks.

> recovery point is advanced without flushing the data after recovery
> ---
>
> Key: KAFKA-10101
> URL: https://issues.apache.org/jira/browse/KAFKA-10101
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.5.0
>Reporter: Jun Rao
>Priority: Major
>
> Currently, in Log.recoverLog(), we set recoveryPoint to logEndOffset after 
> recovering the log segment. However, we don't flush the log segments after 
> recovery. The potential issue is that if the broker has another hard failure, 
> segments may be corrupted on disk but won't be going through recovery on 
> another restart.
> This logic was introduced in KAFKA-5829 since 1.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10101) recovery point is advanced without flushing the data after recovery

2020-06-04 Thread Jun Rao (Jira)
Jun Rao created KAFKA-10101:
---

 Summary: recovery point is advanced without flushing the data 
after recovery
 Key: KAFKA-10101
 URL: https://issues.apache.org/jira/browse/KAFKA-10101
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.5.0
Reporter: Jun Rao


Currently, in Log.recoverLog(), we set recoveryPoint to logEndOffset after 
recovering the log segment. However, we don't flush the log segments after 
recovery. The potential issue is that if the broker has another hard failure, 
segments may be corrupted on disk but won't be going through recovery on 
another restart.

This logic was introduced in KAFKA-5829 since 1.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-806) Index may not always observe log.index.interval.bytes

2020-05-28 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118904#comment-17118904
 ] 

Jun Rao commented on KAFKA-806:
---

[~mjsax]: I think this is still an issue.

> Index may not always observe log.index.interval.bytes
> -
>
> Key: KAFKA-806
> URL: https://issues.apache.org/jira/browse/KAFKA-806
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Jun Rao
>Priority: Major
>
> Currently, each log.append() will add at most 1 index entry, even when the 
> appended data is larger than log.index.interval.bytes. One potential issue is 
> that if a follower restarts after being down for a long time, it may fetch 
> data much bigger than log.index.interval.bytes at a time. This means that 
> fewer index entries are created, which can increase the fetch time from the 
> consumers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9826) Log cleaning repeatedly picks same segment with no effect when first dirty offset is past start of active segment

2020-05-07 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101751#comment-17101751
 ] 

Jun Rao commented on KAFKA-9826:


[~fekelund]: Yes. Updated the fix versions.

> Log cleaning repeatedly picks same segment with no effect when first dirty 
> offset is past start of active segment
> -
>
> Key: KAFKA-9826
> URL: https://issues.apache.org/jira/browse/KAFKA-9826
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.4.1
>Reporter: Steve Rodrigues
>Assignee: Steve Rodrigues
>Priority: Major
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> Seen on a system where a given partition had a single segment, and for 
> whatever reason (deleteRecords?), the logStartOffset was greater than the 
> base segment of the log, there were a continuous series of 
> ```
> [2020-03-03 16:56:31,374] WARN Resetting first dirty offset of  FOO-3 to log 
> start offset 55649 since the checkpointed offset 0 is invalid. 
> (kafka.log.LogCleanerManager$)
> ```
> messages (partition name changed, it wasn't really FOO). This was expected to 
> be resolved by KAFKA-6266 but clearly wasn't. 
> Further investigation revealed that  a few segments were continuously 
> cleaning and generating messages in the `log-cleaner.log` of the form:
> ```
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Beginning cleaning of log FOO-3 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Building offset map for FOO-3... 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Building offset map for log FOO-3 
> for 0 segments in offset range [55287, 54237). (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Offset map for log FOO-3 complete. 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Cleaning log FOO-3 (cleaning prior 
> to Wed Dec 31 19:00:00 EST 1969, discarding tombstones prior to Tue Dec 10 
> 13:39:08 EST 2019)... (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO [kafka-log-cleaner-thread-1]: Log cleaner 
> thread 1 cleaned log FOO-3 (dirty section = [55287, 55287])
> 0.0 MB of log processed in 0.0 seconds (0.0 MB/sec).
> Indexed 0.0 MB in 0.0 seconds (0.0 Mb/sec, 100.0% of total time)
> Buffer utilization: 0.0%
> Cleaned 0.0 MB in 0.0 seconds (NaN Mb/sec, 0.0% of total time)
> Start size: 0.0 MB (0 messages)
> End size: 0.0 MB (0 messages) NaN% size reduction (NaN% fewer messages) 
> (kafka.log.LogCleaner)
> ```
> What seems to have happened here (data determined for a different partition) 
> is:
> There exist a number of partitions here which get relatively low traffic, 
> including our friend FOO-5. For whatever reason, LogStartOffset of this 
> partition has moved beyond the baseOffset of the active segment. (Notes in 
> other issues indicate that this is a reasonable scenario.) So there’s one 
> segment, starting at 166266, and a log start of 166301.
> grabFilthiestCompactedLog runs and reads the checkpoint file. We see that 
> this topicpartition needs to be cleaned, and call cleanableOffsets on it 
> which returns an OffsetsToClean with firstDirtyOffset == logStartOffset == 
> 166301 and firstUncleanableOffset = max(logStart, activeSegment.baseOffset) = 
> 116301, and forceCheckpoint = true.
> The checkpoint file is updated in grabFilthiestCompactedLog (this is the fix 
> for KAFKA-6266). We then create a LogToClean object based on the 
> firstDirtyOffset and firstUncleanableOffset of 166301 (past the active 
> segment’s base offset).
> The LogToClean object has cleanBytes = logSegments(-1, 
> firstDirtyOffset).map(_.size).sum → the size of this segment. It has 
> firstUncleanableOffset and cleanableBytes determined by 
> calculateCleanableBytes. calculateCleanableBytes returns:
> {{}}
> {{val firstUncleanableSegment = 
> log.nonActiveLogSegmentsFrom(uncleanableOffset).headOption.getOrElse(log.activeSegment)}}
> {{val firstUncleanableOffset = firstUncleanableSegment.baseOffset}}
> {{val cleanableBytes = log.logSegments(firstDirtyOffset, 
> math.max(firstDirtyOffset, firstUncleanableOffset)).map(_.size.toLong).sum
> (firstUncleanableOffset, cleanableBytes)}}
> firstUncleanableSegment is activeSegment. firstUncleanableOffset is the base 
> offset, 166266. cleanableBytes is looking for logSegments(166301, max(166301, 
> 166266) → which _is the active segment_
> So there are “cleanableBytes” > 0.
> We then filter out segments with totalbytes (clean + cleanable) > 0. This 
> segment has totalBytes > 0, and it has cleanablebytes, so great! It’s 
> filthiest.
> The cleaner picks it, calls cleanLog on it, which then does cleaner.clean, 
> which returns nextDirtyOffset and cleaner stats. cleaner.clean callls 
> doClean, w

[jira] [Updated] (KAFKA-9826) Log cleaning repeatedly picks same segment with no effect when first dirty offset is past start of active segment

2020-05-07 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-9826:
---
Fix Version/s: 2.5.1
   2.4.2

> Log cleaning repeatedly picks same segment with no effect when first dirty 
> offset is past start of active segment
> -
>
> Key: KAFKA-9826
> URL: https://issues.apache.org/jira/browse/KAFKA-9826
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.4.1
>Reporter: Steve Rodrigues
>Assignee: Steve Rodrigues
>Priority: Major
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> Seen on a system where a given partition had a single segment, and for 
> whatever reason (deleteRecords?), the logStartOffset was greater than the 
> base segment of the log, there were a continuous series of 
> ```
> [2020-03-03 16:56:31,374] WARN Resetting first dirty offset of  FOO-3 to log 
> start offset 55649 since the checkpointed offset 0 is invalid. 
> (kafka.log.LogCleanerManager$)
> ```
> messages (partition name changed, it wasn't really FOO). This was expected to 
> be resolved by KAFKA-6266 but clearly wasn't. 
> Further investigation revealed that  a few segments were continuously 
> cleaning and generating messages in the `log-cleaner.log` of the form:
> ```
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Beginning cleaning of log FOO-3 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Building offset map for FOO-3... 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Building offset map for log FOO-3 
> for 0 segments in offset range [55287, 54237). (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Offset map for log FOO-3 complete. 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Cleaning log FOO-3 (cleaning prior 
> to Wed Dec 31 19:00:00 EST 1969, discarding tombstones prior to Tue Dec 10 
> 13:39:08 EST 2019)... (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO [kafka-log-cleaner-thread-1]: Log cleaner 
> thread 1 cleaned log FOO-3 (dirty section = [55287, 55287])
> 0.0 MB of log processed in 0.0 seconds (0.0 MB/sec).
> Indexed 0.0 MB in 0.0 seconds (0.0 Mb/sec, 100.0% of total time)
> Buffer utilization: 0.0%
> Cleaned 0.0 MB in 0.0 seconds (NaN Mb/sec, 0.0% of total time)
> Start size: 0.0 MB (0 messages)
> End size: 0.0 MB (0 messages) NaN% size reduction (NaN% fewer messages) 
> (kafka.log.LogCleaner)
> ```
> What seems to have happened here (data determined for a different partition) 
> is:
> There exist a number of partitions here which get relatively low traffic, 
> including our friend FOO-5. For whatever reason, LogStartOffset of this 
> partition has moved beyond the baseOffset of the active segment. (Notes in 
> other issues indicate that this is a reasonable scenario.) So there’s one 
> segment, starting at 166266, and a log start of 166301.
> grabFilthiestCompactedLog runs and reads the checkpoint file. We see that 
> this topicpartition needs to be cleaned, and call cleanableOffsets on it 
> which returns an OffsetsToClean with firstDirtyOffset == logStartOffset == 
> 166301 and firstUncleanableOffset = max(logStart, activeSegment.baseOffset) = 
> 116301, and forceCheckpoint = true.
> The checkpoint file is updated in grabFilthiestCompactedLog (this is the fix 
> for KAFKA-6266). We then create a LogToClean object based on the 
> firstDirtyOffset and firstUncleanableOffset of 166301 (past the active 
> segment’s base offset).
> The LogToClean object has cleanBytes = logSegments(-1, 
> firstDirtyOffset).map(_.size).sum → the size of this segment. It has 
> firstUncleanableOffset and cleanableBytes determined by 
> calculateCleanableBytes. calculateCleanableBytes returns:
> {{}}
> {{val firstUncleanableSegment = 
> log.nonActiveLogSegmentsFrom(uncleanableOffset).headOption.getOrElse(log.activeSegment)}}
> {{val firstUncleanableOffset = firstUncleanableSegment.baseOffset}}
> {{val cleanableBytes = log.logSegments(firstDirtyOffset, 
> math.max(firstDirtyOffset, firstUncleanableOffset)).map(_.size.toLong).sum
> (firstUncleanableOffset, cleanableBytes)}}
> firstUncleanableSegment is activeSegment. firstUncleanableOffset is the base 
> offset, 166266. cleanableBytes is looking for logSegments(166301, max(166301, 
> 166266) → which _is the active segment_
> So there are “cleanableBytes” > 0.
> We then filter out segments with totalbytes (clean + cleanable) > 0. This 
> segment has totalBytes > 0, and it has cleanablebytes, so great! It’s 
> filthiest.
> The cleaner picks it, calls cleanLog on it, which then does cleaner.clean, 
> which returns nextDirtyOffset and cleaner stats. cleaner.clean callls 
> doClean, which builds an offsetMap. The offsetMap looks at non

[jira] [Resolved] (KAFKA-9866) Do not attempt to elect preferred leader replicas which are outside ISR

2020-04-27 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9866.

Fix Version/s: 2.6.0
   Resolution: Fixed

Merged the PR to trunk.

> Do not attempt to elect preferred leader replicas which are outside ISR
> ---
>
> Key: KAFKA-9866
> URL: https://issues.apache.org/jira/browse/KAFKA-9866
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Wang Ge
>Priority: Minor
> Fix For: 2.6.0
>
>
> The controller automatically triggers a preferred leader election every N 
> minutes. It tries to elect all preferred leaders of partitions without doing 
> some basic checks like whether the leader is in sync.
> This leads to a multitude of errors which cause confusion:
> {code:java}
> April 14th 2020, 17:01:11.015 [Controller id=0] Partition TOPIC-9 failed to 
> complete preferred replica leader election to 1. Leader is still 0{code}
> {code:java}
> April 14th 2020, 17:01:11.002 [Controller id=0] Error completing replica 
> leader election (PREFERRED) for partition TOPIC-9
> kafka.common.StateChangeFailedException: Failed to elect leader for partition 
> TOPIC-9 under strategy PreferredReplicaPartitionLeaderElectionStrategy {code}
> It would be better if the Controller filtered out some of these elections, 
> not attempt them at all and maybe log an aggregate INFO-level log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-2419) Allow certain Sensors to be garbage collected after inactivity

2020-04-23 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090711#comment-17090711
 ] 

Jun Rao commented on KAFKA-2419:


Sensor expiring is only used on the broker. For the clients, typically the 
metric is never garbage collected once created. Could you categorize what those 
metrics are ?

> Allow certain Sensors to be garbage collected after inactivity
> --
>
> Key: KAFKA-2419
> URL: https://issues.apache.org/jira/browse/KAFKA-2419
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
>Priority: Blocker
>  Labels: quotas
> Fix For: 0.9.0.0
>
> Attachments: Leak1, Leak1.jpg, leak2.jpg
>
>
> Currently, metrics cannot be removed once registered. 
> Implement a feature to remove certain sensors after a certain period of 
> inactivity (perhaps configurable).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9892) Producer state snapshot needs to be forced to disk

2020-04-20 Thread Jun Rao (Jira)
Jun Rao created KAFKA-9892:
--

 Summary: Producer state snapshot needs to be forced to disk
 Key: KAFKA-9892
 URL: https://issues.apache.org/jira/browse/KAFKA-9892
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.6.0
Reporter: Jun Rao


Currently, ProducerStateManager.writeSnapshot() only calls fileChannel.close(), 
but not explicitly fileChannel.force(). It seems force() is not guaranteed to 
be called on close(). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9543) Consumer offset reset after new segment rolling

2020-04-17 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085916#comment-17085916
 ] 

Jun Rao commented on KAFKA-9543:


[~hachikuji]: Thanks for the analysis. It does seem this issue could be caused 
by KAFKA-9838.

Also, in Log.read(), we have code like the following. If we get to the _else_ 
part, the assumption is that maxOffsetMetadata.segmentBaseOffset > 
segment.baseOffset. Perhaps it's useful to assert that. That may help uncover 
issues that we may not know yet.

 

 
{code:java}
val maxPosition = {
 // Use the max offset position if it is on this segment; otherwise, the 
segment size is the limit.
 if (maxOffsetMetadata.segmentBaseOffset == segment.baseOffset) {
 maxOffsetMetadata.relativePositionInSegment
 } else {
 segment.size
 }
}{code}
 

> Consumer offset reset after new segment rolling
> ---
>
> Key: KAFKA-9543
> URL: https://issues.apache.org/jira/browse/KAFKA-9543
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Rafał Boniecki
>Priority: Major
> Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9826) Log cleaning repeatedly picks same segment with no effect when first dirty offset is past start of active segment

2020-04-14 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9826.

Fix Version/s: 2.6.0
   Resolution: Fixed

merged the PR to trunk

> Log cleaning repeatedly picks same segment with no effect when first dirty 
> offset is past start of active segment
> -
>
> Key: KAFKA-9826
> URL: https://issues.apache.org/jira/browse/KAFKA-9826
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.4.1
>Reporter: Steve Rodrigues
>Assignee: Steve Rodrigues
>Priority: Major
> Fix For: 2.6.0
>
>
> Seen on a system where a given partition had a single segment, and for 
> whatever reason (deleteRecords?), the logStartOffset was greater than the 
> base segment of the log, there were a continuous series of 
> ```
> [2020-03-03 16:56:31,374] WARN Resetting first dirty offset of  FOO-3 to log 
> start offset 55649 since the checkpointed offset 0 is invalid. 
> (kafka.log.LogCleanerManager$)
> ```
> messages (partition name changed, it wasn't really FOO). This was expected to 
> be resolved by KAFKA-6266 but clearly wasn't. 
> Further investigation revealed that  a few segments were continuously 
> cleaning and generating messages in the `log-cleaner.log` of the form:
> ```
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Beginning cleaning of log FOO-3 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,924] INFO Cleaner 1: Building offset map for FOO-3... 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Building offset map for log FOO-3 
> for 0 segments in offset range [55287, 54237). (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Offset map for log FOO-3 complete. 
> (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO Cleaner 1: Cleaning log FOO-3 (cleaning prior 
> to Wed Dec 31 19:00:00 EST 1969, discarding tombstones prior to Tue Dec 10 
> 13:39:08 EST 2019)... (kafka.log.LogCleaner)
> [2020-03-31 13:34:50,927] INFO [kafka-log-cleaner-thread-1]: Log cleaner 
> thread 1 cleaned log FOO-3 (dirty section = [55287, 55287])
> 0.0 MB of log processed in 0.0 seconds (0.0 MB/sec).
> Indexed 0.0 MB in 0.0 seconds (0.0 Mb/sec, 100.0% of total time)
> Buffer utilization: 0.0%
> Cleaned 0.0 MB in 0.0 seconds (NaN Mb/sec, 0.0% of total time)
> Start size: 0.0 MB (0 messages)
> End size: 0.0 MB (0 messages) NaN% size reduction (NaN% fewer messages) 
> (kafka.log.LogCleaner)
> ```
> What seems to have happened here (data determined for a different partition) 
> is:
> There exist a number of partitions here which get relatively low traffic, 
> including our friend FOO-5. For whatever reason, LogStartOffset of this 
> partition has moved beyond the baseOffset of the active segment. (Notes in 
> other issues indicate that this is a reasonable scenario.) So there’s one 
> segment, starting at 166266, and a log start of 166301.
> grabFilthiestCompactedLog runs and reads the checkpoint file. We see that 
> this topicpartition needs to be cleaned, and call cleanableOffsets on it 
> which returns an OffsetsToClean with firstDirtyOffset == logStartOffset == 
> 166301 and firstUncleanableOffset = max(logStart, activeSegment.baseOffset) = 
> 116301, and forceCheckpoint = true.
> The checkpoint file is updated in grabFilthiestCompactedLog (this is the fix 
> for KAFKA-6266). We then create a LogToClean object based on the 
> firstDirtyOffset and firstUncleanableOffset of 166301 (past the active 
> segment’s base offset).
> The LogToClean object has cleanBytes = logSegments(-1, 
> firstDirtyOffset).map(_.size).sum → the size of this segment. It has 
> firstUncleanableOffset and cleanableBytes determined by 
> calculateCleanableBytes. calculateCleanableBytes returns:
> {{}}
> {{val firstUncleanableSegment = 
> log.nonActiveLogSegmentsFrom(uncleanableOffset).headOption.getOrElse(log.activeSegment)}}
> {{val firstUncleanableOffset = firstUncleanableSegment.baseOffset}}
> {{val cleanableBytes = log.logSegments(firstDirtyOffset, 
> math.max(firstDirtyOffset, firstUncleanableOffset)).map(_.size.toLong).sum
> (firstUncleanableOffset, cleanableBytes)}}
> firstUncleanableSegment is activeSegment. firstUncleanableOffset is the base 
> offset, 166266. cleanableBytes is looking for logSegments(166301, max(166301, 
> 166266) → which _is the active segment_
> So there are “cleanableBytes” > 0.
> We then filter out segments with totalbytes (clean + cleanable) > 0. This 
> segment has totalBytes > 0, and it has cleanablebytes, so great! It’s 
> filthiest.
> The cleaner picks it, calls cleanLog on it, which then does cleaner.clean, 
> which returns nextDirtyOffset and cleaner stats. cleaner.clean callls 
> doClean, which builds an offsetMap. The offsetMap 

[jira] [Commented] (KAFKA-9839) IllegalStateException on metadata update when broker learns about its new epoch after the controller

2020-04-09 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079544#comment-17079544
 ] 

Jun Rao commented on KAFKA-9839:


Thanks for reporting this, Anna. Accepting UPDATE_METADATA (and other similar 
requests from the controller) with a larger broker epoch is probably the easier 
solution.

> IllegalStateException on metadata update when broker learns about its new 
> epoch after the controller
> 
>
> Key: KAFKA-9839
> URL: https://issues.apache.org/jira/browse/KAFKA-9839
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 2.3.1
>Reporter: Anna Povzner
>Priority: Critical
>
> Broker throws "java.lang.IllegalStateException: Epoch XXX larger than current 
> broker epoch YYY"  on UPDATE_METADATA when the controller learns about the 
> broker epoch and sends UPDATE_METADATA before KafkaZkCLient.registerBroker 
> completes (the broker learns about its new epoch).
> Here is the scenario we observed in more detail:
> 1. ZK session expires on broker 1
> 2. Broker 1 establishes new session to ZK and creates znode
> 3. Controller learns about broker 1 and assigns epoch
> 4. Broker 1 receives UPDATE_METADATA from controller, but it does not know 
> about its new epoch yet, so we get an exception:
> ERROR [KafkaApi-3] Error when handling request: clientId=1, correlationId=0, 
> api=UPDATE_METADATA, body={
> .
> java.lang.IllegalStateException: Epoch XXX larger than current broker epoch 
> YYY at kafka.server.KafkaApis.isBrokerEpochStale(KafkaApis.scala:2725) at 
> kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:320) at 
> kafka.server.KafkaApis.handle(KafkaApis.scala:139) at 
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at 
> java.lang.Thread.run(Thread.java:748)
> 5. KafkaZkCLient.registerBroker completes on broker 1: "INFO Stat of the 
> created znode at /brokers/ids/1"
> The result is the broker has a stale metadata for some time.
> Possible solutions:
> 1. Broker returns a more specific error and controller retries UPDATE_MEDATA
> 2. Broker accepts UPDATE_METADATA with larger broker epoch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8470) State change logs should not be in TRACE level

2020-03-26 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-8470.

Fix Version/s: 2.6.0
   Resolution: Fixed

Merged the PR to trunk.

    1. Defaults state-change log level to INFO.

    2. INFO level state-change log includes  (a) request level logging with 
just partition counts; (b) the leader/isr changes per partition in the 
controller and in the broker (reduced to mostly just 1 logging per partition).

> State change logs should not be in TRACE level
> --
>
> Key: KAFKA-8470
> URL: https://issues.apache.org/jira/browse/KAFKA-8470
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Stanislav Kozlovski
>Assignee: Stanislav Kozlovski
>Priority: Minor
> Fix For: 2.6.0
>
>
> The StateChange logger in Kafka should not be logging its state changes in 
> TRACE level.
> We consider these changes very useful in debugging and we additionally 
> configure that logger to log in TRACE levels by default.
> Since we consider it important enough to configure its own logger to log in a 
> separate log level, why don't we change those logs to INFO and have the 
> logger use the defaults?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9711) The authentication failure caused by SSLEngine#beginHandshake is not properly caught and handled

2020-03-24 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9711.

Fix Version/s: 2.6.0
   Resolution: Fixed

Merged the PR to trunk.

> The authentication failure caused by SSLEngine#beginHandshake is not properly 
> caught and handled
> 
>
> Key: KAFKA-9711
> URL: https://issues.apache.org/jira/browse/KAFKA-9711
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 2.6.0
>
>
> {code:java}
> @Override
> public void handshake() throws IOException {
> if (state == State.NOT_INITALIZED)
> startHandshake(); // this line
> if (ready())
> throw renegotiationException();
> if (state == State.CLOSING)
> throw closingException();
> {code}
> SSLEngine#beginHandshake is possible to throw authentication failures (for 
> example, no suitable cipher suites) so we ought to catch SSLException and 
> then convert it to SslAuthenticationException so as to process authentication 
> failures correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2020-03-17 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061243#comment-17061243
 ] 

Jun Rao commented on KAFKA-3919:


[~zhangchenghui]: Appreciate the detailed analysis. I noticed that the magic in 
the message format is 1, which is quite old. The latest message format with 
magic 2 has been introduced since 0.11 and is probably what most people are 
using now. So, I'd recommend that you try the new message format. If you still 
see an issue, let us know (So far, we haven't seen a similar issue reported on 
the new message format).

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>Assignee: Ben Stopford
>Priority: Major
>  Labels: reliability
> Fix For: 0.11.0.0
>
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage normal operation. The period of larger 
> batches is from just after the outage, where producers have a back log to 
> processes when the partition becomes available, and then things return to 
> normal batch

[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2020-03-11 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057325#comment-17057325
 ] 

Jun Rao commented on KAFKA-9693:


[~paolomoriello]: By metadata, I was referring to the metadata in the file 
system (e.g., file length, timestamp, etc). Basically, I am just wondering what 
contention is causing the producer spike. The producer is writing to a 
different file from the one being flushed. So, there is no obvious data 
contention.

> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time)
>  * the index is full
> In this case, the increase in latency happens on _append_ of a new message 
> set to the active segment of the log. This is a synchronous operation which 
> therefore blocks producers requests, causing the latency increase.
> To confirm this, I instrumented Kafka to measure the duration of 
> FileRecords.append(MemoryRecords) method, which is responsible of writing 
> memory records to file. As a result, I observed the same spiky pattern as in 
> the producer latency, with a one-to-one correspondence with the append 
> duration.
>  !image-2020-0

[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2020-03-11 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057256#comment-17057256
 ] 

Jun Rao commented on KAFKA-9693:


[~paolomoriello]: Thanks for the reply. Do you know if the increase latency is 
due to contention on the data or metadata in the file system? If it's the data, 
it seems that by delaying the flush long enough, the producer would have moved 
to a different block from the one being flushed. If it's the metadata, we flush 
other checkpoint files periodically and I am wondering it they cause producer 
latency spike too.

> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time)
>  * the index is full
> In this case, the increase in latency happens on _append_ of a new message 
> set to the active segment of the log. This is a synchronous operation which 
> therefore blocks producers requests, causing the latency increase.
> To confirm this, I instrumented Kafka to measure the duration of 
> FileRecords.append(MemoryRecords) method, which is responsible of writing 
> memory records to file. As a result, I observed the same spiky pattern as 

[jira] [Commented] (KAFKA-9693) Kafka latency spikes caused by log segment flush on roll

2020-03-10 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056188#comment-17056188
 ] 

Jun Rao commented on KAFKA-9693:


[~paolomoriello]: Thanks for the detailed analysis. Very interesting. (1) 
What's the file system used in the test? (2) Being able to advance the recovery 
point reduces the broker restart time on hard failure. I am wondering if just 
delaying the flush of the rolled segment achieve the same benefit?

> Kafka latency spikes caused by log segment flush on roll
> 
>
> Key: KAFKA-9693
> URL: https://issues.apache.org/jira/browse/KAFKA-9693
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
> Environment: OS: Amazon Linux 2
> Kafka version: 2.2.1
>Reporter: Paolo Moriello
>Assignee: Paolo Moriello
>Priority: Major
>  Labels: Performance, latency, performance
> Attachments: image-2020-03-10-13-17-34-618.png, 
> image-2020-03-10-14-36-21-807.png, image-2020-03-10-15-00-23-020.png, 
> image-2020-03-10-15-00-54-204.png, latency_plot2.png
>
>
> h1. Summary
> When a log segment fills up, Kafka rolls over onto a new active segment and 
> force the flush of the old segment to disk. When this happens, log segment 
> _append_ duration increase causing important latency spikes on producer(s) 
> and replica(s). This ticket aims to highlight the problem and propose a 
> simple mitigation: add a new configuration to enable/disable rolled segment 
> flush.
> h1. 1. Phenomenon
> Response time of produce request (99th ~ 99.9th %ile) repeatedly spikes to 
> ~50x-200x more than usual. For instance, normally 99th %ile is lower than 
> 5ms, but when this issue occurs, it marks 100ms to 200ms. 99.9th and 99.99th 
> %iles even jump to 500-700ms.
> Latency spikes happen at constant frequency (depending on the input 
> throughput), for small amounts of time. All the producers experience a 
> latency increase at the same time.
> h1. !image-2020-03-10-13-17-34-618.png|width=942,height=314!
> {{Example of response time plot observed during on a single producer.}}
> URPs rarely appear in correspondence of the latency spikes too. This is 
> harder to reproduce, but from time to time it is possible to see a few 
> partitions going out of sync in correspondence of a spike.
> h1. 2. Experiment
> h2. 2.1 Setup
> Kafka cluster hosted on AWS EC2 instances.
> h4. Cluster
>  * 15 Kafka brokers: (EC2 m5.4xlarge)
>  ** Disk: 1100Gb EBS volumes (4750Mbps)
>  ** Network: 10 Gbps
>  ** CPU: 16 Intel Xeon Platinum 8000
>  ** Memory: 64Gb
>  * 3 Zookeeper nodes: m5.large
>  * 6 producers on 6 EC2 instances in the same region
>  * 1 topic, 90 partitions - replication factor=3
> h4. Broker config
> Relevant configurations:
> {quote}num.io.threads=8
>  num.replica.fetchers=2
>  offsets.topic.replication.factor=3
>  num.network.threads=5
>  num.recovery.threads.per.data.dir=2
>  min.insync.replicas=2
>  num.partitions=1
> {quote}
> h4. Perf Test
>  * Throughput ~6000-8000 (~40-70Mb/s input + replication = ~120-210Mb/s per 
> broker)
>  * record size = 2
>  * Acks = 1, linger.ms = 1, compression.type = none
>  * Test duration: ~20/30min
> h2. 2.2 Analysis
> Our analysis showed an high +correlation between log segment flush count/rate 
> and the latency spikes+. This indicates that the spikes in max latency are 
> related to Kafka behavior on rolling over new segments.
> The other metrics did not show any relevant impact on any hardware component 
> of the cluster, eg. cpu, memory, network traffic, disk throughput...
>  
>  !latency_plot2.png|width=924,height=308!
>  {{Correlation between latency spikes and log segment flush count. p50, p95, 
> p99, p999 and p latencies (left axis, ns) and the flush #count (right 
> axis, stepping blue line in plot).}}
> Kafka schedules logs flushing (this includes flushing the file record 
> containing log entries, the offset index, the timestamp index and the 
> transaction index) during _roll_ operations. A log is rolled over onto a new 
> empty log when:
>  * the log segment is full
>  * the maxtime has elapsed since the timestamp of first message in the 
> segment (or, in absence of it, since the create time)
>  * the index is full
> In this case, the increase in latency happens on _append_ of a new message 
> set to the active segment of the log. This is a synchronous operation which 
> therefore blocks producers requests, causing the latency increase.
> To confirm this, I instrumented Kafka to measure the duration of 
> FileRecords.append(MemoryRecords) method, which is responsible of writing 
> memory records to file. As a result, I observed the same spiky pattern as in 
> the producer latency, with a one-to-one correspondence with the append 
> duration.
>  !image-2020-03-10-14-36-2

[jira] [Commented] (KAFKA-9677) Low consume bandwidth quota may cause consumer not being able to fetch data

2020-03-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053839#comment-17053839
 ] 

Jun Rao commented on KAFKA-9677:


[~apovzner] : Thanks for finding this issue! I agree with your suggested fix by 
capping fetchMaxBytes.

> Low consume bandwidth quota may cause consumer not being able to fetch data
> ---
>
> Key: KAFKA-9677
> URL: https://issues.apache.org/jira/browse/KAFKA-9677
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
>Reporter: Anna Povzner
>Assignee: Anna Povzner
>Priority: Major
>
> When we changed quota communication with KIP-219, fetch requests get 
> throttled by returning empty response with the delay in `throttle_time_ms` 
> and Kafka consumer retrying again after the delay. 
> With default configs, the maximum fetch size could be as big as 50MB (or 10MB 
> per partition). The default broker config (1-second window, 10 full windows 
> of tracked bandwidth/thread utilization usage) means that < 5MB/s consumer 
> quota (per broker) may stop fetch request from ever being successful.
> Or the other way around: 1 MB/s consumer quota (per broker) means that any 
> fetch request that gets >= 10MB of data (10 seconds * 1MB/second) in the 
> response will never get through. From consumer point of view, the behavior 
> will be: Consumer will get an empty response with throttle_time_ms > 0, Kafka 
> consumer will wait for throttle time delay and then send fetch request again, 
> the fetch response is still too big so broker sends another empty response 
> with throttle time, and so on in never ending loop
> h3. Proposed fix
> Return less data in fetch response in this case: Cap `fetchMaxBytes` passed 
> to replicaManager.fetchMessages() from KafkaApis.handleFetchRequest() to 
>  * . In the example of default 
> configs and 1MB/s consumer bandwidth quota, fetchMaxBytes will be 10MB.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-05 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052419#comment-17052419
 ] 

Jun Rao commented on KAFKA-9648:


[~flashmouse] : Thanks for the reply. Just to clarify. TCP window scaling 
allows receiving window size more than 64KB. But it seems that the 1412 bytes 
used by the producer is much smaller than 64KB. So, even with window scaling 
turned off, it seems that the producer should be able to send more bytes before 
being ack-ed?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051699#comment-17051699
 ] 

Jun Rao commented on KAFKA-9648:


[~flashmouse]: Thanks for the analysis. Great finding! The impact seems on the 
producer. Did you send a large tcp socket buffer for the producer? Was the 
network latency between the producer and the broker high?

> kafka server should resize backlog when create serversocket
> ---
>
> Key: KAFKA-9648
> URL: https://issues.apache.org/jira/browse/KAFKA-9648
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.0.1
>Reporter: li xiangyuan
>Priority: Minor
>
> I have describe a mystery problem 
> (https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
> server will trigger tcp Congestion Control in some condition. finally we 
> found the root cause.
> when kafka server restart for any reason and then execute preferred replica 
> leader, lots of replica leader will give back to it & trigger cluster 
> metadata update. then all clients will establish connection to this server. 
> at the monment many tcp estable request are waiting in the tcp sync queue , 
> and then to accept queue. 
> kafka create serversocket in SocketServer.scala 
>  
> {code:java}
> serverChannel.socket.bind(socketAddress);{code}
> this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
> will decide the queue length.beacues kafka haven't set ,it is default value 
> 50.
> if this queue is full, and tcp_syncookies = 0, then new connection request 
> will be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie 
> mechanism. this mechanism could allow linux handle more tcp sync request, but 
> it would lose many tcp external parameter, include "wscale", the one that 
> allow tcp connection to send much more bytes per tcp package. because 
> syncookie triggerd, wscale has lost, and this tcp connection will handle 
> network very slow, forever,until this connection is closed and establish 
> another tcp connection.
> so after a preferred repilca executed, lots of new tcp connection will 
> establish without set wscale,and many network traffic to this server will 
> have a very slow speed.
> i'm not sure whether new linux version have resolved this problem, but kafka 
> also should set backlog a larger value. we now have modify this to 512, seems 
> everything is ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9498) Topic validation during the creation trigger unnecessary TopicChange events

2020-02-25 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-9498.

Fix Version/s: 2.6.0
   Resolution: Fixed

Merged the PR to trunk.

> Topic validation during the creation trigger unnecessary TopicChange events 
> 
>
> Key: KAFKA-9498
> URL: https://issues.apache.org/jira/browse/KAFKA-9498
> Project: Kafka
>  Issue Type: Bug
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 2.6.0
>
>
> I have found out that the topic validation logic, which is executed when 
> CreateTopicPolicy or when validateOnly is set, triggers unnecessary 
> ChangeTopic events in the controller. In the worst case, it can trigger up to 
> one event per created topic and leads to overloading the controller.
> This happens because the validation logic reads all the topics from ZK using 
> the method getAllTopicsInCluster provided by the KafkaZKClient. This method 
> registers a watch every time the topics are read from Zookeeper.
> I think that we should make the watch registration optional for this call in 
> oder to avoid this unwanted behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-21 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042307#comment-17042307
 ] 

Jun Rao commented on KAFKA-4084:


https://issues.apache.org/jira/browse/KAFKA-9594 identifies an issue that can 
delay for processing of LeaderAndIsrRequest in the follower.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is invali

2020-02-20 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-6266.

Resolution: Fixed

Merged into 2.4 and trunk.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Assignee: David Mao
>Priority: Major
> Fix For: 2.5.0, 2.4.1
>
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

2020-02-18 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039522#comment-17039522
 ] 

Jun Rao commented on KAFKA-6266:


[~david.mao]: I merged the PR to trunk. However, it doesn't apply to 2.4. Could 
you submit a separate PR for the 2.4 branch? Thanks.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Assignee: David Mao
>Priority: Major
> Fix For: 2.5.0, 2.4.1
>
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-7061) Enhanced log compaction

2020-02-11 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034592#comment-17034592
 ] 

Jun Rao commented on KAFKA-7061:


[~senthilm-ms] : Sorry for the delay. I will review the PR this week.

> Enhanced log compaction
> ---
>
> Key: KAFKA-7061
> URL: https://issues.apache.org/jira/browse/KAFKA-7061
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.5.0
>Reporter: Luis Cabral
>Assignee: Senthilnathan Muthusamy
>Priority: Major
>  Labels: kip
>
> Enhance log compaction to support more than just offset comparison, so the 
> insertion order isn't dictating which records to keep.
> Default behavior is kept as it was, with the enhanced approached having to be 
> purposely activated.
>  The enhanced compaction is done either via the record timestamp, by settings 
> the new configuration as "timestamp" or via the record headers by setting 
> this configuration to anything other than the default "offset" or the 
> reserved "timestamp".
> See 
> [KIP-280|https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction]
>  for more details.
> +From Guozhang:+ We should emphasize on the WIKI that the newly introduced 
> config yields to the existing "log.cleanup.policy", i.e. if the latter's 
> value is `delete` not `compact`, then the previous config would be ignored.
> +From Jun Rao:+ With the timestamp/header strategy, the behavior of the 
> application may need to change. In particular, the application can't just 
> blindly take the record with a larger offset and assuming that it's the value 
> to keep. It needs to check the timestamp or the header now. So, it would be 
> useful to at least document this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8374) KafkaApis.handleLeaderAndIsrRequest not robust to ZooKeeper exceptions

2020-02-10 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033734#comment-17033734
 ] 

Jun Rao commented on KAFKA-8374:


This could be related to https://issues.apache.org/jira/browse/KAFKA-9307.

> KafkaApis.handleLeaderAndIsrRequest not robust to ZooKeeper exceptions
> --
>
> Key: KAFKA-8374
> URL: https://issues.apache.org/jira/browse/KAFKA-8374
> Project: Kafka
>  Issue Type: Bug
>  Components: core, offset manager
>Affects Versions: 2.0.1
> Environment: Linux x86_64 (Ubuntu Xenial) running on AWS EC2
>Reporter: Mike Mintz
>Assignee: Bob Barrett
>Priority: Major
>
> h2. Summary of bug (theory)
> During a leader election, when a broker is transitioning from leader to 
> follower on some __consumer_offset partitions and some __transaction_state 
> partitions, it’s possible for a ZooKeeper exception to be thrown, leaving the 
> GroupMetadataManager in an inconsistent state.
>  
> In particular, in KafkaApis.handleLeaderAndIsrRequest in the 
> onLeadershipChange callback, it’s possible for 
> TransactionCoordinator.handleTxnEmigration to throw 
> ZooKeeperClientExpiredException, ending the updatedFollowers.foreach loop 
> early. If there were any __consumer_offset partitions to be handled later in 
> the loop, GroupMetadataManager will be left with stale data in its 
> groupMetadataCache. Later, when this broker resumes leadership for the 
> affected __consumer_offset partitions, it will fail to load the updated 
> groups into the cache since it uses putIfNotExists, and it will serve stale 
> offsets to consumers.
>  
> h2. Details of what we experienced
> We ran into this issue running Kafka 2.0.1 in production. Several Kafka 
> consumers received stale offsets when reconnecting to their group coordinator 
> after a leadership election on their __consumer_offsets partition. This 
> caused them to reprocess many duplicate messages.
>  
> We believe we’ve tracked down the root cause: * On 2019-04-01, we were having 
> memory pressure in ZooKeeper, and we were getting several 
> ZooKeeperClientExpiredException errors in the logs.
>  * The impacted consumers were all in __consumer_offsets-15. There was a 
> leader election on this partition, and leadership transitioned from broker 
> 1088 to broker 1069. During this leadership election, the former leader 
> (1088) saw a ZooKeeperClientExpiredException  (stack trace below). This 
> happened inside KafkaApis.handleLeaderAndIsrRequest, specifically in 
> onLeadershipChange while it was updating a __transaction_state partition. 
> Since there are no “Scheduling unloading” or “Finished unloading” log 
> messages in this period, we believe it threw this exception before getting to 
> __consumer_offsets-15, so it never got a chance to call 
> GroupCoordinator.handleGroupEmigration, which means this broker didn’t unload 
> offsets from its GroupMetadataManager.
>  * Four days later, on 2019-04-05, we manually restarted broker 1069, so 
> broker 1088 became the leader of __consumer_offsets-15 again. When it ran 
> GroupMetadataManager.loadGroup, it presumably failed to update 
> groupMetadataCache since it uses putIfNotExists, and it would have found the 
> group id already in the cache. Unfortunately we did not have debug logging 
> enabled, but I would expect to have seen a log message like "Attempt to load 
> group ${group.groupId} from log with generation ${group.generationId} failed 
> because there is already a cached group with generation 
> ${currentGroup.generationId}".
>  * After the leadership election, the impacted consumers reconnected to 
> broker 1088 and received stale offsets that correspond to the last committed 
> offsets around 2019-04-01.
>  
> h2. Relevant log/stacktrace
> {code:java}
> [2019-04-01 22:44:18.968617] [2019-04-01 22:44:18,963] ERROR [KafkaApi-1088] 
> Error when handling request 
> {controller_id=1096,controller_epoch=122,partition_states=[...,{topic=__consumer_offsets,partition=15,controller_epoch=122,leader=1069,leader_epoch=440,isr=[1092,1088,1069],zk_version=807,replicas=[1069,1088,1092],is_new=false},...],live_leaders=[{id=1069,host=10.68.42.121,port=9094}]}
>  (kafka.server.KafkaApis)
> [2019-04-01 22:44:18.968689] kafka.zookeeper.ZooKeeperClientExpiredException: 
> Session expired either before or while waiting for connection
> [2019-04-01 22:44:18.968712]         at 
> kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:238)
> [2019-04-01 22:44:18.968736]         at 
> kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
> [2019-04-01 22:44:18.968759]         at 
> kafka.zookeeper.Z

[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032006#comment-17032006
 ] 

Jun Rao commented on KAFKA-4084:


Do you see the incoming byte rate of that broker much more than 10MB/sec?

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031955#comment-17031955
 ] 

Jun Rao commented on KAFKA-4084:


[~evanjpw]: There are metrics related to replication throttling 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas#KIP-73ReplicationQuotas-Metrics]).
 Perhaps you can check to see if the replication quota is actually engaged.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-05 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030864#comment-17030864
 ] 

Jun Rao commented on KAFKA-4084:


[~evanjpw]: I am not sure if you need the new config. The controller already 
knows which replicas are in-sync.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-05 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030819#comment-17030819
 ] 

Jun Rao commented on KAFKA-4084:


[~evanjpw]:
 # For this case, you do have to script up things yourself. Basically, you need 
to set up the per topic config for throttled replicas and then set the broker 
level throttling value.
 # We monitor the incoming/outgoing replication traffic (in-sync and 
out-of-sync) per broker. If that's higher than the throttling value, we 
throttle the out-of-sync replicas.
 # To solve this particular problem, another way is to change a bit how auto 
leader balancing works. For example, we can choose to only move the leaders to 
a broker when all replicas on it are in-sync.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030298#comment-17030298
 ] 

Jun Rao commented on KAFKA-4084:


[~sriharsha], hmm, if one sets follower.replication.throttled.rate to 200MB/sec 
on the new broker, the total incoming replication traffic is capped at 
200MB/sec, independent of the number of brokers it fetches from. With that, it 
seems that one can reserve some bandwidth for regular clients during the 
catchup phase.

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions

2020-02-04 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030265#comment-17030265
 ] 

Jun Rao commented on KAFKA-4084:


[~evanjpw] : Yes, for now, you can potentially set up replication quota. You 
will need to set leader.replication.throttled.replicas and 
follower.replication.throttled.replicas for each topic residing on that broker. 
You can use the following command to do that.

 
{code:java}
bin/kafka-configs … --alter
--add-config 'leader.replication.throttled.replicas=*'
--entity-type topic{code}
 

[~sriharsha] : I am not sure if KIP-491 is necessarily the best approach to 
address this particular issue (in general, one probably shouldn't have any 
broker overloaded at any time). However, if there are other convincing use 
cases, we could consider it.

 

> automated leader rebalance causes replication downtime for clusters with too 
> many partitions
> 
>
> Key: KAFKA-4084
> URL: https://issues.apache.org/jira/browse/KAFKA-4084
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.2.2, 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Priority: Major
>  Labels: reliability
> Fix For: 1.1.0
>
>
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and 
> you have a cluster with many partitions, there is a severe amount of 
> replication downtime following a restart. This causes 
> `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes 
> leaders for *all* imbalanced partitions at once, instead of doing it 
> gradually. This effectively stops all replica fetchers in the cluster 
> (assuming there are enough imbalanced partitions), and restarts them. This 
> can take minutes on busy clusters, during which no replication is happening 
> and user data is at risk. Clients with {{acks=-1}} also see issues at this 
> time, because replication is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election 
> manually. There is also a broker configuration “auto.leader.rebalance.enable” 
> which you can set to have the broker automatically perform the PLE when 
> needed. DO NOT USE THIS OPTION. There are serious performance issues when 
> doing so, especially on larger clusters. It needs some development work that 
> has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high 
> partition counts causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of 
> partitions to do automated leader rebalancing for at once, and *stop* once 
> that number of leader rebalances are in flight, until they're done. There may 
> be better mechanisms, and I'd love to hear if anybody has any ideas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2020-01-22 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021795#comment-17021795
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190] :  To verify this, I wrote the following short program. I created 
an instance of KafkaZkClient and registered a StateChangeHandler that blocks in 
beforeInitializingSession(). 
{code:java}
package kafka.tools

import kafka.zk.KafkaZkClient
import kafka.zookeeper.StateChangeHandler
import org.apache.kafka.common.utils.Time

object Mytest {

  // visible for testing
  private[tools] val RecordIndent = "|"

  def main(args: Array[String]): Unit = {
val kakfaZkClient = KafkaZkClient("localhost:2181", false, 6000, 6000, 10, 
Time.SYSTEM, "mytest")
kakfaZkClient.registerStateChangeHandler(new StateChangeHandler {
  override val name: String = "mytest"
  override def afterInitializingSession(): Unit = {
throw new IllegalStateException
  }
  override def beforeInitializingSession(): Unit = {
Thread.sleep(Integer.MAX_VALUE) //block forever
  }
})

println("zookeeper client state: " + 
kakfaZkClient.currentZooKeeper.getState)
Thread.sleep(2)
try {
  val children = kakfaZkClient.getChildren("/")
  println("child nodes are " + children)
} catch {
  case t: Throwable =>
println("hit exception " + t)
}
println("zookeeper client state: " + 
kakfaZkClient.currentZooKeeper.getState)
  }
}{code}
 

I then started zookeeper server, and ran the above program 
(bin/kafka-run-class.sh kafka.tools.Mytest). I waited until the zookeeper 
client got to the CONNECTED state (but before the 20 sec sleep completes) and 
did "kill -STOP" to pause the program. I waited for another 6 seconds for the 
ZK session to expire. Then I did "kill -CONT" to resume the program. The 
following is the output that I got.

 

 
{code:java}
zookeeper client state: CONNECTED
[2020-01-22 21:47:01,693] WARN Client session timed out, have not heard from 
server in 17038ms for sessionid 0x1002ced93e6 
(org.apache.zookeeper.ClientCnxn)
[2020-01-22 21:47:03,351] WARN Unable to reconnect to ZooKeeper service, 
session 0x1002ced93e6 has expired (org.apache.zookeeper.ClientCnxn)
hit exception org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /
zookeeper client state: CLOSED 
{code}
 

As you can see, this program verifies a few things. (1) If a ZK session 
expires, the state of ZK client transitions to CLOSED (not CONNECTING). (2) If 
ZooKeeperClient.handleRequests() is called after the ZK session has expired, 
the call doesn't block and returns a Session expired error code. (3) Even if 
beforeInitializingSession() blocks, ZooKeeperClient.handleRequests() is not 
blocked on an expired session.

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: controllerDeadLockAnalysis-2020-01-20.png, js.log, 
> js0.log, js1.log, js2.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperC

[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2020-01-20 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019835#comment-17019835
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190]: Thanks for the followup investigation. The following code was 
added because we only want to create a new session if the current one is 
actually expired.

 
{code:java}
if (!connectionState.isAlive) {
{code}
I am also not sure that changing the above logic actually fixes the deadlock. 
At that point, zk-session-expired thread is already blocked on 
callBeforeInitializingSession(), before the above code is called. 

I still don't quite understand why ZooKeeperClient.handleRequests() blocks at 
countDownLatch.await() in jstack. At that point, the ZK session has expired. My 
understanding is that supposedly, all pending ZK requests should complete with 
a SessionExpired error through the response callback. I am not sure if the 
issue is in ZK itself or how we use ZK.

 

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: controllerDeadLockAnalysis-2020-01-20.png, js.log, 
> js0.log, js1.log, js2.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: W

[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2020-01-17 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018279#comment-17018279
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190]: Sorry for the late reply. Took a look at your last comment. If we 
actually hit an exception in ZooKeeperClient.handleRequests(), the caller won't 
be blocked on countDownLatch.await() since we won't even reach that code. So, 
Ted's PR does make the code cleaner, but I am not sure if that solves this 
particular problem. The main question is why ZooKeeperClient.handleRequests() 
blocks at countDownLatch.await(). At that point, the ZK session has expired. 
All pending calls on that expired ZK session should complete with a 
SessionExpired error through the response callback. 

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: js.log, js0.log, js1.log, js2.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005d1be5a00> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util

[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments

2020-01-10 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013081#comment-17013081
 ] 

Jun Rao commented on KAFKA-8764:


[~trajakovic]: Thanks for providing the additional info. That makes sense now. 
Currently, the log cleaner runs independently on each replica. If a follower 
has been down for some time, it is possible that the leader has already cleaned 
the data and left holes in some log segments. When those segments get 
replicated to the follower, the follower will clean the same data again and 
potentially hit the above issue.

> LogCleanerManager endless loop while compacting/cleaning segments
> -
>
> Key: KAFKA-8764
> URL: https://issues.apache.org/jira/browse/KAFKA-8764
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.3.0, 2.2.1, 2.4.0
> Environment: docker base image: openjdk:8-jre-alpine base image, 
> kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz
>Reporter: Tomislav Rajakovic
>Priority: Major
>  Labels: patch
> Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, 
> kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, 
> log-cleaner-bug-reproduction.zip
>
>
> {{LogCleanerManager stuck in endless loop while clearing segments for one 
> partition resulting with many log outputs and heavy disk read/writes/IOPS.}}
>  
> Issue appeared on follower brokers, and it happens on every (new) broker if 
> partition assignment is changed.
>  
> Original issue setup:
>  * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers
>  * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB
>  * 5 zookeepers
>  * topic created with config:
>  ** name = "backup_br_domain_squad"
> partitions = 36
> replication_factor = 3
> config = {
>  "cleanup.policy" = "compact"
>  "min.compaction.lag.ms" = "8640"
>  "min.cleanable.dirty.ratio" = "0.3"
> }
>  
>  
> Log excerpt:
> {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,895] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.lo

[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments

2020-01-10 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013058#comment-17013058
 ] 

Jun Rao commented on KAFKA-8764:


[~trajakovic]: Thanks for the investigation. Normally, the offset map is built 
from the dirty portion of the log that shouldn't contain holes in offsets. If 
segment [0,233) misses offset 232, it means that this segment has been cleaned 
and the dirty offset should have moved to 233 after the first round of 
cleaning. Was the dirty offset ever reset manually? In any case, I agree that 
it would be better to make the code more defensive. Could you submit the patch 
as a PR (details can be found in 
[https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes])?

> LogCleanerManager endless loop while compacting/cleaning segments
> -
>
> Key: KAFKA-8764
> URL: https://issues.apache.org/jira/browse/KAFKA-8764
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.3.0, 2.2.1, 2.4.0
> Environment: docker base image: openjdk:8-jre-alpine base image, 
> kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz
>Reporter: Tomislav Rajakovic
>Priority: Major
>  Labels: patch
> Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, 
> kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, 
> log-cleaner-bug-reproduction.zip
>
>
> {{LogCleanerManager stuck in endless loop while clearing segments for one 
> partition resulting with many log outputs and heavy disk read/writes/IOPS.}}
>  
> Issue appeared on follower brokers, and it happens on every (new) broker if 
> partition assignment is changed.
>  
> Original issue setup:
>  * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers
>  * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB
>  * 5 zookeepers
>  * topic created with config:
>  ** name = "backup_br_domain_squad"
> partitions = 36
> replication_factor = 3
> config = {
>  "cleanup.policy" = "compact"
>  "min.compaction.lag.ms" = "8640"
>  "min.cleanable.dirty.ratio" = "0.3"
> }
>  
>  
> Log excerpt:
> {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,895] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (

[jira] [Commented] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is inval

2019-12-03 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987550#comment-16987550
 ] 

Jun Rao commented on KAFKA-6266:


Looking at the code where the "Resetting first dirty offset " warning is 
emitted. It seems that when we only return the new cleaning offset based on log 
start offset but not writing the new value to the cleaner offset checkpoint 
file. So, it the case when there is nothing to clean, this warning will repeat 
every time the log cleaner checks for logs to clean. To fix this, we should 
overwrite the cleaner offset checkpoint file as well.

> Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of 
> __consumer_offsets-xx to log start offset 203569 since the checkpointed 
> offset 120955 is invalid. (kafka.log.LogCleanerManager$)
> --
>
> Key: KAFKA-6266
> URL: https://issues.apache.org/jira/browse/KAFKA-6266
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.0.0, 1.0.1
> Environment: CentOS 7, Apache kafka_2.12-1.0.0
>Reporter: VinayKumar
>Priority: Major
>
> I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below 
> warnings in the log.
>  I'm seeing these continuously in the log, and want these to be fixed- so 
> that they wont repeat. Can someone please help me in fixing the below 
> warnings.
> {code}
> WARN Resetting first dirty offset of __consumer_offsets-17 to log start 
> offset 3346 since the checkpointed offset 3332 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-23 to log start 
> offset 4 since the checkpointed offset 1 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-19 to log start 
> offset 203569 since the checkpointed offset 120955 is invalid. 
> (kafka.log.LogCleanerManager$)
>  WARN Resetting first dirty offset of __consumer_offsets-35 to log start 
> offset 16957 since the checkpointed offset 7 is invalid. 
> (kafka.log.LogCleanerManager$)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-14 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974548#comment-16974548
 ] 

Jun Rao commented on KAFKA-9044:


[~pmbuko]: Thanks for the update.

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and 2.3.0 and this issue has 
> affected all of them. Because of replication and the size of the clusters (30 
> brokers), this bug is not causing any data loss, but it is nevertheless 
> concerning. When a broker drops out, the log gives no indication that there 
> are any zookeeper issues (and indeed the zookeepers are healthy when this 
> occurs. Here's snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.G

[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-07 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969452#comment-16969452
 ] 

Jun Rao commented on KAFKA-9044:


The ZK session is typically created when the broker starts, not when it becomes 
the controller.

Alternatively, the ZK server may log the client IP address for a ZK session 
when it's created. If you can find out that, you can see if the IP matches the 
broker IP.

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offse

[jira] [Created] (KAFKA-9157) logcleaner could generate empty segment files after cleaning

2019-11-07 Thread Jun Rao (Jira)
Jun Rao created KAFKA-9157:
--

 Summary: logcleaner could generate empty segment files after 
cleaning
 Key: KAFKA-9157
 URL: https://issues.apache.org/jira/browse/KAFKA-9157
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.3.0
Reporter: Jun Rao


Currently, the log cleaner could only combine segments within a 2-billion 
offset range. If all records in that range are deleted, an empty segment could 
be generated. It would be useful to avoid generating such empty segments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968926#comment-16968926
 ] 

Jun Rao commented on KAFKA-9044:


When a broker establishes a ZK session, it should log the session id as the 
following.

 
{code:java}
INFO Session establishment complete on server localhost/127.0.0.1:2181, 
sessionid = 0x100450d9045, negotiated timeout = 6000 
(org.apache.zookeeper.ClientCnxn{code}

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expire

[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968861#comment-16968861
 ] 

Jun Rao commented on KAFKA-9044:


Hmm, I did a quick check and the controller seems to never delete a broker 
registration path in ZK directly (so only the closing or the expiration of the 
ZK session cause the path to go away). Also, normally the controller will first 
create the /controller path in ZK immediately after the session is established. 
Could you double check whether this session indeed comes from the controller 
machine?

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.

[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968833#comment-16968833
 ] 

Jun Rao commented on KAFKA-9044:


Interesting, that ZK session lasted for only 30 secs. This almost seems like 
someone ran a tool that created a session to ZK, deleted a ZK note and then 
closed the session quickly. The ZK log contains the client IP address 
associated with the ZK session. Could you find out that IP and see whether 
that's a broker machine or not?

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629

[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968793#comment-16968793
 ] 

Jun Rao commented on KAFKA-9044:


Did the ZK log show that broker 23's ZK session expired around that time?

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds.

[jira] [Commented] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

2019-11-06 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968545#comment-16968545
 ] 

Jun Rao commented on KAFKA-9044:


It may also be useful to parse the ZK commit log to see if there is any 
deletion of the ZK node for the broker around that time.

> Brokers occasionally (randomly?) dropping out of clusters
> -
>
> Key: KAFKA-9044
> URL: https://issues.apache.org/jira/browse/KAFKA-9044
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.3.0, 2.3.1
> Environment: Ubuntu 14.04
>Reporter: Peter Bukowinski
>Priority: Major
>
> I have several cluster running kafka 2.3.1 and this issue has affected all of 
> them. Because of replication and the size of the clusters (30 brokers), this 
> bug is not causing any data loss, but it is nevertheless concerning. When a 
> broker drops out, the log gives no indication that there are any zookeeper 
> issues (and indeed the zookeepers are healthy when this occurs. Here's 
> snippet from a broker log when it occurs:
> {{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 
> expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to 
> retention time 360ms breach (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] 
> for deletion. (kafka.log.Log)}}
>  {{[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, 
> dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted log 
> /data/3/kl/internal_test-52/01975332.log.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,957] INFO Deleted offset index 
> /data/3/kl/internal_test-52/01975332.index.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:03:56,958] INFO Deleted time index 
> /data/3/kl/internal_test-52/01975332.timeindex.deleted. 
> (kafka.log.LogSegment)}}
>  {{[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 1 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 
> 0 expired offsets in 0 milliseconds. 
> (kafka.coordinator.group.GroupMetadataManager)}}
>  {{[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brok

[jira] [Commented] (KAFKA-7987) a broker's ZK session may die on transient auth failure

2019-11-05 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967806#comment-16967806
 ] 

Jun Rao commented on KAFKA-7987:


[~gcampbell], we could try calling scheduleSessionExpiryHandler() in 
ZooKeeperClientWatcher.process() when authentication fails. We have to think a 
bit more whether we still need StateChangeHandler.onAuthFailure() or not.

> a broker's ZK session may die on transient auth failure
> ---
>
> Key: KAFKA-7987
> URL: https://issues.apache.org/jira/browse/KAFKA-7987
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Priority: Major
>
> After a transient network issue, we saw the following log in a broker.
> {code:java}
> [23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: 
> javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Server not found in Kerberos database (7))]) occurred when 
> evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client 
> will go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
> [23:37:02,102] ERROR [ZooKeeperClient] Auth failed. 
> (kafka.zookeeper.ZooKeeperClient)
> {code}
> The network issue prevented the broker from communicating to ZK. The broker's 
> ZK session then expired, but the broker didn't know that yet since it 
> couldn't establish a connection to ZK. When the network was back, the broker 
> tried to establish a connection to ZK, but failed due to auth failure (likely 
> due to a transient KDC issue). The current logic just ignores the auth 
> failure without trying to create a new ZK session. Then the broker will be 
> permanently in a state that it's alive, but not registered in ZK.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-30 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919669#comment-16919669
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123], not sure that I fully understood your question. We need to write 
the partition level checkpoint after the cleaning of each log. Typically, the 
latter takes much more time than the former. So the overhead of the former 
should be small?

 

Also, if we are adding a new checkpoint file, we probably want to create a KIP 
to discuss this a bit more.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-19 Thread Jun Rao (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910793#comment-16910793
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123], I was thinking the new checkpoint file will be under 
disk_x/topic_y/z for each partition. We will deprecate the old checkpoint file 
under disk_x/.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905792#comment-16905792
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123]: For upgrade, we can roughly do the following. If the new code only 
finds the old style checkpoint file, it uses that for the cleaning offsets. 
Once the new checkpoint files have been written, the old checkpoint file can be 
deleted. After that, the new code will be using the cleaning offsets in the new 
checkpoint file.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Improvement
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-08-07 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902334#comment-16902334
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123], I was thinking of adding a new checkpoint file per partition. If 
we do that, we probably want to move the cleaning offset currently stored in 
the per disk log cleaner checkpoint file there and deprecate the per disk log 
cleaner checkpoint file.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8716) broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 2.2.1 or 2.3.0

2019-08-07 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902324#comment-16902324
 ] 

Jun Rao commented on KAFKA-8716:


[~yuyang08], thanks for the update.

> broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 
> 2.2.1 or 2.3.0
> 
>
> Key: KAFKA-8716
> URL: https://issues.apache.org/jira/browse/KAFKA-8716
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Yu Yang
>Priority: Critical
>
> We are trying to upgrade kafka binary from 2.1 to 2.2.1 or 2.3.0. For both 
> versions, the broker with updated binary (2.2.1 or 2.3.0) could not get 
> started due to zookeeper session expiration exception.   This error happens 
> repeatedly and the broker could not start because of this. 
> Below is our zk related setting in server.properties:
> {code}
> zookeeper.connection.timeout.ms=6000
> zookeeper.session.timeout.ms=6000
> {code}
> The following is the stack trace, and we are using zookeeper 3.5.3. Instead 
> of waiting for a few seconds, the SESSIONEXPIRED error returned immediately 
> in CheckedEphemeral.create call.  Any insights? 
> [2019-07-25 18:07:35,712] INFO Creating /brokers/ids/80 (is it secure? false) 
> (kafka.zk.KafkaZkClient)
> [2019-07-25 18:07:35,724] ERROR Error while creating ephemeral at 
> /brokers/ids/80 with return code: SESSIONEXPIRED 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2019-07-25 18:07:35,731] ERROR [KafkaServer id=80] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
> at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1725)
> at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1689)
> at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:97)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:260)
> at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
> at kafka.Kafka$.main(Kafka.scala:75)
> at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-4545) tombstone needs to be removed after delete.retention.ms has passed after it has been cleaned

2019-08-02 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899305#comment-16899305
 ] 

Jun Rao commented on KAFKA-4545:


[~Yohan123], if you want to work on this jira and follow the approach I 
mentioned earlier, we probably need a KIP since it changes the on-disk format.

> tombstone needs to be removed after delete.retention.ms has passed after it 
> has been cleaned
> 
>
> Key: KAFKA-4545
> URL: https://issues.apache.org/jira/browse/KAFKA-4545
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Jun Rao
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>
> The algorithm for removing the tombstone in a compacted is supposed to be the 
> following.
> 1. Tombstone is never removed when it's still in the dirty portion of the log.
> 2. After the tombstone is in the cleaned portion of the log, we further delay 
> the removal of the tombstone by delete.retention.ms since the time the 
> tombstone is in the cleaned portion.
> Once the tombstone is in the cleaned portion, we know there can't be any 
> message with the same key before the tombstone. Therefore, for any consumer, 
> if it reads a non-tombstone message before the tombstone, but can read to the 
> end of the log within delete.retention.ms, it's guaranteed to see the 
> tombstone.
> However, the current implementation doesn't seem correct. We delay the 
> removal of the tombstone by delete.retention.ms since the last modified time 
> of the last cleaned segment. However, the last modified time is inherited 
> from the original segment, which could be arbitrarily old. So, the tombstone 
> may not be preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8716) broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 2.2.1 or 2.3.0

2019-07-26 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894142#comment-16894142
 ] 

Jun Rao commented on KAFKA-8716:


It's weird that the session was closed very quickly after SyncConnected. Is 
there any log on ZK server that indicates why the session was closed?

> broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 
> 2.2.1 or 2.3.0
> 
>
> Key: KAFKA-8716
> URL: https://issues.apache.org/jira/browse/KAFKA-8716
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Yu Yang
>Priority: Critical
>
> We are trying to upgrade kafka binary from 2.1 to 2.2.1 or 2.3.0. For both 
> versions, the broker with updated binary (2.2.1 or 2.3.0) could not get 
> started due to zookeeper session expiration exception.   This error happens 
> repeatedly and the broker could not start because of this. 
> Below is our zk related setting in server.properties:
> {code}
> zookeeper.connection.timeout.ms=6000
> zookeeper.session.timeout.ms=6000
> {code}
> The following is the stack trace, and we are using zookeeper 3.5.3. Instead 
> of waiting for a few seconds, the SESSIONEXPIRED error returned immediately 
> in CheckedEphemeral.create call.  Any insights? 
> [2019-07-25 18:07:35,712] INFO Creating /brokers/ids/80 (is it secure? false) 
> (kafka.zk.KafkaZkClient)
> [2019-07-25 18:07:35,724] ERROR Error while creating ephemeral at 
> /brokers/ids/80 with return code: SESSIONEXPIRED 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2019-07-25 18:07:35,731] ERROR [KafkaServer id=80] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
> at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1725)
> at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1689)
> at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:97)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:260)
> at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
> at kafka.Kafka$.main(Kafka.scala:75)
> at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8716) broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 2.2.1 or 2.3.0

2019-07-26 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893977#comment-16893977
 ] 

Jun Rao commented on KAFKA-8716:


Hmm, that's kind of weird. We do wait for the ZK session to be connected in the 
constructor of ZooKeeperClient. 

{code:java}
  try waitUntilConnected(connectionTimeoutMs, TimeUnit.MILLISECONDS)
  catch {
case e: Throwable =>
  close()
  throw e
  }
{code}


It should be rare for the session to be expired immediately after the 
connection is established. Could you turn on the debug level logging in 
ZooKeeperClientWatcher so that we can see logs like the following?
{code:java}
debug(s"Received event: $event"){code}


> broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 
> 2.2.1 or 2.3.0
> 
>
> Key: KAFKA-8716
> URL: https://issues.apache.org/jira/browse/KAFKA-8716
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Yu Yang
>Priority: Critical
>
> We are trying to upgrade kafka binary from 2.1 to 2.2.1 or 2.3.0. For both 
> versions, the broker with updated binary (2.2.1 or 2.3.0) could not get 
> started due to zookeeper session expiration exception.   This error happens 
> repeatedly and the broker could not start because of this. 
> Below is our zk related setting in server.properties:
> {code}
> zookeeper.connection.timeout.ms=6000
> zookeeper.session.timeout.ms=6000
> {code}
> The following is the stack trace, and we are using zookeeper 3.5.3. Instead 
> of waiting for a few seconds, the SESSIONEXPIRED error returned immediately 
> in CheckedEphemeral.create call.  Any insights? 
> [2019-07-25 18:07:35,712] INFO Creating /brokers/ids/80 (is it secure? false) 
> (kafka.zk.KafkaZkClient)
> [2019-07-25 18:07:35,724] ERROR Error while creating ephemeral at 
> /brokers/ids/80 with return code: SESSIONEXPIRED 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2019-07-25 18:07:35,731] ERROR [KafkaServer id=80] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
> at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1725)
> at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1689)
> at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:97)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:260)
> at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
> at kafka.Kafka$.main(Kafka.scala:75)
> at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8716) broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 2.2.1 or 2.3.0

2019-07-26 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893925#comment-16893925
 ] 

Jun Rao commented on KAFKA-8716:


Currently, we don't handle ZK session expiration for the initial broker 
registration in ZK when the broker is started since it's rare. Does that happen 
repeatedly on restart?

> broker cannot join the cluster after upgrading kafka binary from 2.1.1 to 
> 2.2.1 or 2.3.0
> 
>
> Key: KAFKA-8716
> URL: https://issues.apache.org/jira/browse/KAFKA-8716
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Yu Yang
>Priority: Critical
>
> We are trying to upgrade kafka binary from 2.1 to 2.2.1 or 2.3.0. For both 
> versions, the broker with updated binary (2.2.1 or 2.3.0) could not get 
> started due to zookeeper session expiration exception.  
> The following is the stack trace, and we are using zookeeper 3.5.3. Any 
> insights? 
> [2019-07-25 18:07:35,712] INFO Creating /brokers/ids/80 (is it secure? false) 
> (kafka.zk.KafkaZkClient)
> [2019-07-25 18:07:35,724] ERROR Error while creating ephemeral at 
> /brokers/ids/80 with return code: SESSIONEXPIRED 
> (kafka.zk.KafkaZkClient$CheckedEphemeral)
> [2019-07-25 18:07:35,731] ERROR [KafkaServer id=80] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
> at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1725)
> at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1689)
> at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:97)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:260)
> at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
> at kafka.Kafka$.main(Kafka.scala:75)
> at kafka.Kafka.main(Kafka.scala)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-07-25 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893052#comment-16893052
 ] 

Jun Rao commented on KAFKA-8522:


[~Yohan123], this jira is similar to KAFKA-4545 and likely requires the same 
solution. [~jagsancio] assigned himself to KAFKA-4545. So, you probably want to 
coordinate with him. 

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-07-03 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878247#comment-16878247
 ] 

Jun Rao commented on KAFKA-8522:


Since we are returning the tombstone to the consumer, it may not be ideal to 
change it. Another option is to write out the cleaning offset and the 
corresponding cleaning time to a checkpoint file in the partition dir. The 
cleaner can use that checkpoint to determine the cleaning time for each 
tombstone. Once all tombstones before an offset are removed, the corresponding 
entry in the checkpoint file can be removed.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-07-02 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877341#comment-16877341
 ] 

Jun Rao commented on KAFKA-8522:


[~dhruvilshah] mentioned that the original timestamp of the tombstone record 
could be useful for time-based retention and timestamp to offset translation. 
So, it may not be ideal to modify the timestamp in the tombstone record itself. 
We probably have to store the cleaning timestamp somewhere else (e.g. record 
header).

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8522) Tombstones can survive forever

2019-06-27 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874578#comment-16874578
 ] 

Jun Rao commented on KAFKA-8522:


[~EeveeB], thanks for reporting this. This is related to KAFKA-4545.

Basically, the tombstone needs to be retained for at least delete.retention.ms 
after the cleaning point (say T) has passed the offset of the tombstone. 
However, the current estimation of time T is based on the retained last 
modified time of the log segment, which is not accurate. This can cause the 
tombstone to be retained too short or too long (even forever).

One way to fix that is to write the timestamp of time T in the tombstone record 
when the cleaning point first passes the tombstone offset. We can then clean 
the tombstone at a more accurate time.

> Tombstones can survive forever
> --
>
> Key: KAFKA-8522
> URL: https://issues.apache.org/jira/browse/KAFKA-8522
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Evelyn Bayes
>Priority: Minor
>
> This is a bit grey zone as to whether it's a "bug" but it is certainly 
> unintended behaviour.
>  
> Under specific conditions tombstones effectively survive forever:
>  * Small amount of throughput;
>  * min.cleanable.dirty.ratio near or at 0; and
>  * Other parameters at default.
> What  happens is all the data continuously gets cycled into the oldest 
> segment. Old records get compacted away, but the new records continuously 
> update the timestamp of the oldest segment reseting the countdown for 
> deleting tombstones.
> So tombstones build up in the oldest segment forever.
>  
> While you could "fix" this by reducing the segment size, this can be 
> undesirable as a sudden change in throughput could cause a dangerous number 
> of segments to be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2019-06-17 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16865744#comment-16865744
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190], if this issue happens again, it would be useful to take a few 
thread dumps and upload them in the jira. Thanks.

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: js.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005d1be5a00> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>  at kafka.zookeeper.ZooKeeperClient.handleRequests(ZooKeeper

[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2019-06-14 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864184#comment-16864184
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190], expiryScheduler.shutdown() is probably not the issue. That part 
of the code blocks the KafkaServer startup through initZkClient(). At that 
point, the controller hasn't been started and therefore, there is no controller 
event thread yet.

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: js.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting on condition [0x7fccb55c8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005d1be5a00> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at ja

[jira] [Commented] (KAFKA-8532) controller-event-thread deadlock with zk-session-expiry-handler0

2019-06-13 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863468#comment-16863468
 ] 

Jun Rao commented on KAFKA-8532:


[~lbdai3190], thanks for reporting the issue. From the stacktrace, it seems 
that the controller-event-thread is blocked on ZooKeeperClient.scala:157, which 
corresponds to the following line.

 
{code:java}
countDownLatch.await() 
{code}
 

So, it doesn't seem that thread was blocked on inReadLock(initializationLock)? 
** As you said, normally the wait on countDownLatch shouldn't block since if 
the ZK session is expired, the requests should get an error code immediately. 
Do you have the stack trace for the ZK event thread? Normally, this is the 
thread that unblocks the async ZK requests. I am wondering if that thread is 
still alive or is blocked on anything.

The controller-event-thread was processing a RegisterBrokerAndReelect event. 
Since this event is only generated during ZK session expiration, it seems 
consecutive ZK session expirations have happened. I am not sure if that can 
lead to deadlocks though.

> controller-event-thread deadlock with zk-session-expiry-handler0
> 
>
> Key: KAFKA-8532
> URL: https://issues.apache.org/jira/browse/KAFKA-8532
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
>Reporter: leibo
>Priority: Blocker
> Attachments: js.log
>
>
> We have observed a serious deadlock between controller-event-thead and 
> zk-session-expirey-handle thread. When this issue occurred, it's only one way 
> to recovery the kafka cluster is restart kafka server. The  follows is the 
> jstack log of controller-event-thead and zk-session-expiry-handle thread.
> "zk-session-expiry-handler0" #163089 daemon prio=5 os_prio=0 
> tid=0x7fcc9c01 nid=0xfb22 waiting on condition [0x7fcbb01f8000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x0005ee3f7000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) // 
> 等待controller-event-thread线程处理expireEvent
>  at 
> kafka.controller.KafkaController$Expire.waitUntilProcessingStarted(KafkaController.scala:1533)
>  at 
> kafka.controller.KafkaController$$anon$7.beforeInitializingSession(KafkaController.scala:173)
>  at 
> kafka.zookeeper.ZooKeeperClient.callBeforeInitializingSession(ZooKeeperClient.scala:408)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$reinitialize$1$adapted(ZooKeeperClient.scala:374)
>  at kafka.zookeeper.ZooKeeperClient$$Lambda$1473/1823438251.apply(Unknown 
> Source)
>  at scala.collection.Iterator.foreach(Iterator.scala:937)
>  at scala.collection.Iterator.foreach$(Iterator.scala:937)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>  at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:209)
>  at kafka.zookeeper.ZooKeeperClient.reinitialize(ZooKeeperClient.scala:374)
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$scheduleSessionExpiryHandler$1(ZooKeeperClient.scala:428)
>  at 
> kafka.zookeeper.ZooKeeperClient$$Lambda$1471/701792920.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
>  at kafka.utils.KafkaScheduler$$Lambda$198/1048098469.apply$mcV$sp(Unknown 
> Source)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
>  - <0x000661e8d2e0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "controller-event-thread" #51 prio=5 os_prio=0 tid=0x7fceaeec4000 
> nid=0x310 waiting

[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2019-06-12 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862569#comment-16862569
 ] 

Jun Rao commented on KAFKA-7504:


[~allenxwang], thanks for the update. A couple of more thoughts on this.

(1) One thing with the patch is that the pre-fetch only kicks in for reads on 
non-active segments. So, you need to make sure that you have published enough 
data so that each partition has at least 2 segments. I am not sure if you have 
published more than 1TB worth of data since you have 1024 partitions. If not, 
could you redo the experiments with fewer partitions (but still make sure not 
all data fit in memory)?

(2) For the test to be meaningful, we want the real time consumer and the 
lagging consumer (and perhaps the producer) to be on the same network thread. 
One way to achieve that is to set num.network.threads to 1. It would also be 
useful to collect the 99% consumer/producer response send time to see if the 
patch makes any difference.

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: Network_Request_Idle_After_Patch.png, 
> Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, 
> Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them 

[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2019-06-07 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859031#comment-16859031
 ] 

Jun Rao commented on KAFKA-7504:


[~allenxwang], thanks for doing the test and sharing the results. Your results 
seem different from what Cabe reported. What's the produce time that you 
measured? Was that the total produce request time reported on the broker? Also, 
was there any improvement for the latency for the real time consumer during the 
test?

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: Network_Request_Idle_After_Patch.png, 
> Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, 
> Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them sequentially 
> ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]).
>  Ideally processing of these 3 responses completes in microseconds as in it 
> just copies ready responses into client socket's buffer with non-blocking 
> manner.
>  However, Kafka uses sendfile(2) for transferring log data to client sockets. 
> The target data might be in page cache, but old data which has written a bit 
> far ago and never read since then, are likely not.
>  If the 

[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2019-06-04 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856233#comment-16856233
 ] 

Jun Rao commented on KAFKA-7504:


[~allenxwang], there is a PR attached to the jira. I guess we just want it to 
be tested a bit more. If you can help test the PR and share the results, that 
will be great. Thanks,

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: Network_Request_Idle_After_Patch.png, 
> Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, 
> Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them sequentially 
> ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]).
>  Ideally processing of these 3 responses completes in microseconds as in it 
> just copies ready responses into client socket's buffer with non-blocking 
> manner.
>  However, Kafka uses sendfile(2) for transferring log data to client sockets. 
> The target data might be in page cache, but old data which has written a bit 
> far ago and never read since then, are likely not.
>  If the target data isn't in page cache, kernel first needs to load the 
> target page into cache. This takes more than few milliseconds to tens of 

[jira] [Commented] (KAFKA-7983) supporting replication.throttled.replicas in dynamic broker configuration

2019-04-08 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812556#comment-16812556
 ] 

Jun Rao commented on KAFKA-7983:


[~rahul.mnit], just added you to the contributor list and assigned the ticket 
to you. Thanks.

> supporting replication.throttled.replicas in dynamic broker configuration
> -
>
> Key: KAFKA-7983
> URL: https://issues.apache.org/jira/browse/KAFKA-7983
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Assignee: Rahul Agarwal
>Priority: Major
>
> In 
> [KIP-226|https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration#KIP-226-DynamicBrokerConfiguration-DefaultTopicconfigs],
>  we added the support to change broker defaults dynamically. However, it 
> didn't support changing leader.replication.throttled.replicas and 
> follower.replication.throttled.replicas. These 2 configs were introduced in 
> [KIP-73|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
>  and controls the set of topic partitions on which replication throttling 
> will be engaged. One useful case is to be able to set a default value for 
> both configs to * to allow throttling to be engaged for all topic partitions. 
> Currently, the static default value for both configs are ignored for 
> replication throttling, it would be useful to fix that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-7983) supporting replication.throttled.replicas in dynamic broker configuration

2019-04-08 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao reassigned KAFKA-7983:
--

Assignee: Rahul Agarwal

> supporting replication.throttled.replicas in dynamic broker configuration
> -
>
> Key: KAFKA-7983
> URL: https://issues.apache.org/jira/browse/KAFKA-7983
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Assignee: Rahul Agarwal
>Priority: Major
>
> In 
> [KIP-226|https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration#KIP-226-DynamicBrokerConfiguration-DefaultTopicconfigs],
>  we added the support to change broker defaults dynamically. However, it 
> didn't support changing leader.replication.throttled.replicas and 
> follower.replication.throttled.replicas. These 2 configs were introduced in 
> [KIP-73|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
>  and controls the set of topic partitions on which replication throttling 
> will be engaged. One useful case is to be able to set a default value for 
> both configs to * to allow throttling to be engaged for all topic partitions. 
> Currently, the static default value for both configs are ignored for 
> replication throttling, it would be useful to fix that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7983) supporting replication.throttled.replicas in dynamic broker configuration

2019-04-05 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16811005#comment-16811005
 ] 

Jun Rao commented on KAFKA-7983:


Probably. But I think the common broker level setting will be just *.

> supporting replication.throttled.replicas in dynamic broker configuration
> -
>
> Key: KAFKA-7983
> URL: https://issues.apache.org/jira/browse/KAFKA-7983
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> In 
> [KIP-226|https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration#KIP-226-DynamicBrokerConfiguration-DefaultTopicconfigs],
>  we added the support to change broker defaults dynamically. However, it 
> didn't support changing leader.replication.throttled.replicas and 
> follower.replication.throttled.replicas. These 2 configs were introduced in 
> [KIP-73|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
>  and controls the set of topic partitions on which replication throttling 
> will be engaged. One useful case is to be able to set a default value for 
> both configs to * to allow throttling to be engaged for all topic partitions. 
> Currently, the static default value for both configs are ignored for 
> replication throttling, it would be useful to fix that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7983) supporting replication.throttled.replicas in dynamic broker configuration

2019-04-04 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810370#comment-16810370
 ] 

Jun Rao commented on KAFKA-7983:


[~huxi_2b], leader.replication.throttled.replicas and 
follower.replication.throttled.replicas are dynamically set through 
ReplicationQuotaManager.markThrottled() at the topic level. However, these two 
properties don't exist at the broker level config and BrokerConfigHandler 
doesn't call ReplicationQuotaManager.markThrottled(). So, currently, we can't 
set leader.replication.throttled.replicas and 
follower.replication.throttled.replicas at the broker level either statically 
or dynamically.

> supporting replication.throttled.replicas in dynamic broker configuration
> -
>
> Key: KAFKA-7983
> URL: https://issues.apache.org/jira/browse/KAFKA-7983
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Jun Rao
>Priority: Major
>
> In 
> [KIP-226|https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration#KIP-226-DynamicBrokerConfiguration-DefaultTopicconfigs],
>  we added the support to change broker defaults dynamically. However, it 
> didn't support changing leader.replication.throttled.replicas and 
> follower.replication.throttled.replicas. These 2 configs were introduced in 
> [KIP-73|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
>  and controls the set of topic partitions on which replication throttling 
> will be engaged. One useful case is to be able to set a default value for 
> both configs to * to allow throttling to be engaged for all topic partitions. 
> Currently, the static default value for both configs are ignored for 
> replication throttling, it would be useful to fix that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8106) Remove unnecessary decompression operation when logValidator do validation.

2019-03-29 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805539#comment-16805539
 ] 

Jun Rao commented on KAFKA-8106:


[~Flower.min], one of the validation that the broker has to do is to verify 
that the timestamp of each record is within the allowed max diff. One can only 
know the timestamp of a record after decompressing the batch. 

> Remove unnecessary decompression operation when logValidator  do validation.
> 
>
> Key: KAFKA-8106
> URL: https://issues.apache.org/jira/browse/KAFKA-8106
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.1
> Environment: Server : 
> cpu:2*16 ; 
> MemTotal : 256G;
> Ethernet controller:Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network 
> Connection ; 
> SSD.
>Reporter: Flower.min
>Assignee: Flower.min
>Priority: Major
>  Labels: performance
>
>       We do performance testing about Kafka in specific scenarios as 
> described below .We build a kafka cluster with one broker,and create topics 
> with different number of partitions.Then we start lots of producer processes 
> to send large amounts of messages to one of the topics at one  testing .
> *_Specific Scenario_*
>   
>  *_1.Main config of Kafka_*  
>  # Main config of Kafka  
> server:[num.network.threads=6;num.io.threads=128;queued.max.requests=500|http://num.network.threads%3D6%3Bnum.io.threads%3D128%3Bqueued.max.requests%3D500/]
>  # Number of TopicPartition : 50~2000
>  # Size of Single Message : 1024B
>  
>  *_2.Config of KafkaProducer_* 
> ||compression.type||[linger.ms|http://linger.ms/]||batch.size||buffer.memory||
> |lz4|1000ms~5000ms|16KB/10KB/100KB|128MB|
> *_3.The best result of performance testing_*  
> ||Network inflow rate||CPU Used (%)||Disk write speed||Performance of 
> production||
> |550MB/s~610MB/s|97%~99%|:550MB/s~610MB/s       |23,000,000 messages/s|
> *_4.Phenomenon and  my doubt_*
>     _The upper limit of CPU usage has been reached  But  it does not 
> reach the upper limit of the bandwidth of the server  network. *We are 
> doubtful about which  cost too much CPU time and we want to Improve  
> performance and reduces CPU usage of Kafka server.*_
>   
>  _*5.Analysis*_
>         We analysis the JFIR of Kafka server when doing performance testing 
> .We found the hot spot method is 
> *_"java.io.DataInputStream.readFully(byte[],int,int)"_* and 
> *_"org.apache.kafka.common.record.KafkaLZ4BlockInputStream.read(byte[],int,int)"_*.When
>   we checking thread stack information we  also have found most CPU being 
> occupied by lots of thread  which  is busy decompressing messages.Then we 
> read source code of Kafka .
>        There is double-layer nested Iterator  when LogValidator do validate 
> every record.And There is a decompression for each message when traversing 
> every RecordBatch iterator. It is consuming CPU and affect total performance 
> that  decompress message._*The purpose of decompressing every messages just 
> for gain total size in bytes of one record and size in bytes of record body 
> when magic value to use is above 1 and no format conversion or value 
> overwriting is required for compressed messages.It is negative for 
> performance in common usage scenarios .*_{color:#33}Therefore, we suggest 
> that *_removing unnecessary decompression operation_* when doing  validation 
> for compressed message  when magic value to use is above 1 and no format 
> conversion or value overwriting is required for compressed messages.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2019-03-26 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802074#comment-16802074
 ] 

Jun Rao commented on KAFKA-7504:


[~cwaldrop], thanks for the results. It does seem to be nice improvement. The 
99% fetch response send time can still be 1000ms, which is kind of high. Was 
that because after the log segment chunk is pre-touched, it gets evicted 
quickly by the time it's being sent to network?

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: Network_Request_Idle_After_Patch.png, 
> Network_Request_Idle_Per_Before_Patch.png, Response_Times_After_Patch.png, 
> Response_Times_Before_Patch.png, image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them sequentially 
> ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]).
>  Ideally processing of these 3 responses completes in microseconds as in it 
> just copies ready responses into client socket's buffer with non-blocking 
> manner.
>  However, Kafka uses sendfile(2) for transferring log data to client sockets. 
> The target data might be in page cache, but old data which has written a bit 
> far ago and never read since then, are likely not.
>  If the target data isn't in page cache, kernel fir

[jira] [Commented] (KAFKA-7504) Broker performance degradation caused by call of sendfile reading disk in network thread

2019-03-18 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795523#comment-16795523
 ] 

Jun Rao commented on KAFKA-7504:


[~cwaldrop], thanks for helping test this out. It would be useful to have a mix 
of real time and lagging consumers in the test and measure the observed network 
threads idle percent and the consumer response send time (99% or 99.9%) with 
and w/o the patch.

> Broker performance degradation caused by call of sendfile reading disk in 
> network thread
> 
>
> Key: KAFKA-7504
> URL: https://issues.apache.org/jira/browse/KAFKA-7504
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.2.1
>Reporter: Yuto Kawamura
>Assignee: Yuto Kawamura
>Priority: Major
>  Labels: latency, performance
> Attachments: image-2018-10-14-14-18-38-149.png, 
> image-2018-10-14-14-18-57-429.png, image-2018-10-14-14-19-17-395.png, 
> image-2018-10-14-14-19-27-059.png, image-2018-10-14-14-19-41-397.png, 
> image-2018-10-14-14-19-51-823.png, image-2018-10-14-14-20-09-822.png, 
> image-2018-10-14-14-20-19-217.png, image-2018-10-14-14-20-33-500.png, 
> image-2018-10-14-14-20-46-566.png, image-2018-10-14-14-20-57-233.png
>
>
> h2. Environment
> OS: CentOS6
> Kernel version: 2.6.32-XX
>  Kafka version: 0.10.2.1, 0.11.1.2 (but reproduces with latest build from 
> trunk (2.2.0-SNAPSHOT)
> h2. Phenomenon
> Response time of Produce request (99th ~ 99.9th %ile) degrading to 50x ~ 100x 
> more than usual.
>  Normally 99th %ile is lower than 20ms, but when this issue occurs it marks 
> 50ms to 200ms.
> At the same time we could see two more things in metrics:
> 1. Disk read coincidence from the volume assigned to log.dirs.
>  2. Raise in network threads utilization (by 
> `kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent`)
> As we didn't see increase of requests in metrics, we suspected blocking in 
> event loop ran by network thread as the cause of raising network thread 
> utilization.
>  Reading through Kafka broker source code, we understand that the only disk 
> IO performed in network thread is reading log data through calling 
> sendfile(2) (via FileChannel#transferTo).
>  To probe that the calls of sendfile(2) are blocking network thread for some 
> moments, I ran following SystemTap script to inspect duration of sendfile 
> syscalls.
> {code:java}
> # Systemtap script to measure syscall duration
> global s
> global records
> probe syscall.$1 {
> s[tid()] = gettimeofday_us()
> }
> probe syscall.$1.return {
> elapsed = gettimeofday_us() - s[tid()]
> delete s[tid()]
> records <<< elapsed
> }
> probe end {
> print(@hist_log(records))
> }{code}
> {code:java}
> $ stap -v syscall-duration.stp sendfile
> # value (us)
> value | count
> 0 | 0
> 1 |71
> 2 |@@@   6171
>16 |@@@  29472
>32 |@@@   3418
>  2048 | 0
> ...
>  8192 | 3{code}
> As you can see there were some cases taking more than few milliseconds, 
> implies that it blocks network thread for that long and applying the same 
> latency for all other request/response processing.
> h2. Hypothesis
> Gathering the above observations, I made the following hypothesis.
> Let's say network-thread-1 multiplexing 3 connections.
>  - producer-A
>  - follower-B (broker replica fetch)
>  - consumer-C
> Broker receives requests from each of those clients, [Produce, FetchFollower, 
> FetchConsumer].
> They are processed well by request handler threads, and now the response 
> queue of the network-thread contains 3 responses in following order: 
> [FetchConsumer, Produce, FetchFollower].
> network-thread-1 takes 3 responses and processes them sequentially 
> ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L632]).
>  Ideally processing of these 3 responses completes in microseconds as in it 
> just copies ready responses into client socket's buffer with non-blocking 
> manner.
>  However, Kafka uses sendfile(2) for transferring log data to client sockets. 
> The target data might be in page cache, but old data which has written a bit 
> far ago and never read since then, are likely not.
>  If the target data isn't in page cache, kernel first needs to load the 
> target page into cache. This takes more than few milliseconds to tens of 
> milliseconds depending on disk hardware and current load being applied t

[jira] [Commented] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest

2019-03-13 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791910#comment-16791910
 ] 

Jun Rao commented on KAFKA-6029:


[~hzxa21], do you still plan to work on this? Thanks.

> Controller should wait for the leader migration to finish before ack a 
> ControlledShutdownRequest
> 
>
> Key: KAFKA-6029
> URL: https://issues.apache.org/jira/browse/KAFKA-6029
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller, core
>Affects Versions: 1.0.0
>Reporter: Jiangjie Qin
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Major
>
> In the controlled shutdown process, the controller will return the 
> ControlledShutdownResponse immediately after the state machine is updated. 
> Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been 
> successfully processed by the brokers, the leader migration and active ISR 
> shrink may not have done when the shutting down broker proceeds to shut down. 
> This will cause some of the leaders to take up to replica.lag.time.max.ms to 
> kick the broker out of ISR. Meanwhile the produce purgatory size will grow.
> Ideally, the controller should wait until all the LeaderAndIsrRequests and 
> UpdateMetadataRequests has been acked before sending back the 
> ControlledShutdownResponse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7977) Flaky Test ReassignPartitionsClusterTest#shouldOnlyThrottleMovingReplicas

2019-03-07 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-7977.

Resolution: Fixed

The PR in KAFKA-8018 is merged, which should reduce the chance for ZK session 
expiration. Closing this issue for now. If the same issue occurs again, feel 
free to reopen the ticket. 

> Flaky Test ReassignPartitionsClusterTest#shouldOnlyThrottleMovingReplicas
> -
>
> Key: KAFKA-7977
> URL: https://issues.apache.org/jira/browse/KAFKA-7977
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /brokers/topics/topic1 at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:534) at 
> kafka.zk.KafkaZkClient.$anonfun$getReplicaAssignmentForTopics$2(KafkaZkClient.scala:579)
>  at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at 
> scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at 
> kafka.zk.KafkaZkClient.getReplicaAssignmentForTopics(KafkaZkClient.scala:574) 
> at 
> kafka.admin.ReassignPartitionsCommand$.parseAndValidate(ReassignPartitionsCommand.scala:338)
>  at 
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:209)
>  at 
> kafka.admin.ReassignPartitionsClusterTest.shouldOnlyThrottleMovingReplicas(ReassignPartitionsClusterTest.scala:343){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8022) Flaky Test RequestQuotaTest#testExemptRequestTime

2019-03-07 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-8022.

   Resolution: Fixed
Fix Version/s: (was: 0.11.0.4)
   2.2.1
   2.3.0

The PR is merged. Closing it for now.

> Flaky Test RequestQuotaTest#testExemptRequestTime
> -
>
> Key: KAFKA-8022
> URL: https://issues.apache.org/jira/browse/KAFKA-8022
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 0.11.0.3
>Reporter: Matthias J. Sax
>Assignee: Jun Rao
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/328/tests]
> {quote}kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
> at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
> at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96)
> at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
> at kafka.zk.ZooKeeperTestHarness.setUp(ZooKeeperTestHarness.scala:59)
> at 
> kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:90)
> at kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81)
> at kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73)
> at kafka.server.RequestQuotaTest.setUp(RequestQuotaTest.scala:81){quote}
> STDOUT:
> {quote}[2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling 
> request: clientId=unauthorized-CONTROLLED_SHUTDOWN, correlationId=1, 
> api=CONTROLLED_SHUTDOWN, body=\{broker_id=0,broker_epoch=9223372036854775807} 
> (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=2, connectionId=127.0.0.1:37894-127.0.0.1:54838-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-STOP_REPLICA, correlationId=1, api=STOP_REPLICA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,delete_partitions=true,partitions=[{topic=topic-1,partition_ids=[0]}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54822-1, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,091] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-UPDATE_METADATA, correlationId=1, api=UPDATE_METADATA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],offline_replicas=[]}]}],live_brokers=[\{id=0,end_points=[{port=0,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=1, connectionId=127.0.0.1:37894-127.0.0.1:54836-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-LEADER_AND_ISR, correlationId=1, api=LEADER_AND_ISR, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],is_new=true}]}],live_leaders=[\{id=0,host=localhost,port=0}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54834-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,106] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-WRITE_TXN_MARKERS, correlationId=1, 
> api=WRITE_TXN_MARKERS, body=\{transaction_markers=[]} 
> (kafka.server.KafkaApi

[jira] [Commented] (KAFKA-8022) Flaky Test RequestQuotaTest#testExemptRequestTime

2019-03-04 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783838#comment-16783838
 ] 

Jun Rao commented on KAFKA-8022:


The fsync to ZK txn log can take 10+ secs. Filed 
[https://github.com/apache/kafka/pull/6354/files] to disable forceSync in ZK 
txn log in tests.

> Flaky Test RequestQuotaTest#testExemptRequestTime
> -
>
> Key: KAFKA-8022
> URL: https://issues.apache.org/jira/browse/KAFKA-8022
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 0.11.0.3
>Reporter: Matthias J. Sax
>Assignee: Jun Rao
>Priority: Critical
>  Labels: flaky-test
> Fix For: 0.11.0.4
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/328/tests]
> {quote}kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
> at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
> at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96)
> at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
> at kafka.zk.ZooKeeperTestHarness.setUp(ZooKeeperTestHarness.scala:59)
> at 
> kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:90)
> at kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81)
> at kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73)
> at kafka.server.RequestQuotaTest.setUp(RequestQuotaTest.scala:81){quote}
> STDOUT:
> {quote}[2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling 
> request: clientId=unauthorized-CONTROLLED_SHUTDOWN, correlationId=1, 
> api=CONTROLLED_SHUTDOWN, body=\{broker_id=0,broker_epoch=9223372036854775807} 
> (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=2, connectionId=127.0.0.1:37894-127.0.0.1:54838-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-STOP_REPLICA, correlationId=1, api=STOP_REPLICA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,delete_partitions=true,partitions=[{topic=topic-1,partition_ids=[0]}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54822-1, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,091] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-UPDATE_METADATA, correlationId=1, api=UPDATE_METADATA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],offline_replicas=[]}]}],live_brokers=[\{id=0,end_points=[{port=0,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=1, connectionId=127.0.0.1:37894-127.0.0.1:54836-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-LEADER_AND_ISR, correlationId=1, api=LEADER_AND_ISR, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],is_new=true}]}],live_leaders=[\{id=0,host=localhost,port=0}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54834-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,106] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-WRITE_TXN_MARKERS, correlationId=1, 
> api=WRITE_TXN_MARKERS, body=\{tran

[jira] [Assigned] (KAFKA-8022) Flaky Test RequestQuotaTest#testExemptRequestTime

2019-03-04 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao reassigned KAFKA-8022:
--

Assignee: Jun Rao

> Flaky Test RequestQuotaTest#testExemptRequestTime
> -
>
> Key: KAFKA-8022
> URL: https://issues.apache.org/jira/browse/KAFKA-8022
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 0.11.0.3
>Reporter: Matthias J. Sax
>Assignee: Jun Rao
>Priority: Critical
>  Labels: flaky-test
> Fix For: 0.11.0.4
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/328/tests]
> {quote}kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
> at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
> at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96)
> at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
> at kafka.zk.ZooKeeperTestHarness.setUp(ZooKeeperTestHarness.scala:59)
> at 
> kafka.integration.KafkaServerTestHarness.setUp(KafkaServerTestHarness.scala:90)
> at kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:81)
> at kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73)
> at kafka.server.RequestQuotaTest.setUp(RequestQuotaTest.scala:81){quote}
> STDOUT:
> {quote}[2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling 
> request: clientId=unauthorized-CONTROLLED_SHUTDOWN, correlationId=1, 
> api=CONTROLLED_SHUTDOWN, body=\{broker_id=0,broker_epoch=9223372036854775807} 
> (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=2, connectionId=127.0.0.1:37894-127.0.0.1:54838-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-STOP_REPLICA, correlationId=1, api=STOP_REPLICA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,delete_partitions=true,partitions=[{topic=topic-1,partition_ids=[0]}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54822-1, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,091] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-UPDATE_METADATA, correlationId=1, api=UPDATE_METADATA, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],offline_replicas=[]}]}],live_brokers=[\{id=0,end_points=[{port=0,host=localhost,listener_name=PLAINTEXT,security_protocol_type=0}],rack=null}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=1, connectionId=127.0.0.1:37894-127.0.0.1:54836-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,090] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-LEADER_AND_ISR, correlationId=1, api=LEADER_AND_ISR, 
> body=\{controller_id=0,controller_epoch=2147483647,broker_epoch=9223372036854775807,topic_states=[{topic=topic-1,partition_states=[{partition=0,controller_epoch=2147483647,leader=0,leader_epoch=2147483647,isr=[0],zk_version=2,replicas=[0],is_new=true}]}],live_leaders=[\{id=0,host=localhost,port=0}]}
>  (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894-127.0.0.1:54834-2, 
> session=Session(User:Unauthorized,/127.0.0.1), 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> buffer=null) is not authorized.
> [2019-03-01 00:40:47,106] ERROR [KafkaApi-0] Error when handling request: 
> clientId=unauthorized-WRITE_TXN_MARKERS, correlationId=1, 
> api=WRITE_TXN_MARKERS, body=\{transaction_markers=[]} 
> (kafka.server.KafkaApis:76)
> org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
> Request(processor=0, connectionId=127.0.0.1:37894

[jira] [Commented] (KAFKA-7977) Flaky Test ReassignPartitionsClusterTest#shouldOnlyThrottleMovingReplicas

2019-03-01 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782211#comment-16782211
 ] 

Jun Rao commented on KAFKA-7977:


The PR in https://issues.apache.org/jira/browse/KAFKA-8018 can help this test 
too.

> Flaky Test ReassignPartitionsClusterTest#shouldOnlyThrottleMovingReplicas
> -
>
> Key: KAFKA-7977
> URL: https://issues.apache.org/jira/browse/KAFKA-7977
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for /brokers/topics/topic1 at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:534) at 
> kafka.zk.KafkaZkClient.$anonfun$getReplicaAssignmentForTopics$2(KafkaZkClient.scala:579)
>  at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at 
> scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at 
> kafka.zk.KafkaZkClient.getReplicaAssignmentForTopics(KafkaZkClient.scala:574) 
> at 
> kafka.admin.ReassignPartitionsCommand$.parseAndValidate(ReassignPartitionsCommand.scala:338)
>  at 
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:209)
>  at 
> kafka.admin.ReassignPartitionsClusterTest.shouldOnlyThrottleMovingReplicas(ReassignPartitionsClusterTest.scala:343){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8018) Flaky Test SaslSslAdminClientIntegrationTest#testLegacyAclOpsNeverAffectOrReturnPrefixed

2019-03-01 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782210#comment-16782210
 ] 

Jun Rao commented on KAFKA-8018:


Created a PR: https://github.com/apache/kafka/pull/6354

> Flaky Test 
> SaslSslAdminClientIntegrationTest#testLegacyAclOpsNeverAffectOrReturnPrefixed
> 
>
> Key: KAFKA-8018
> URL: https://issues.apache.org/jira/browse/KAFKA-8018
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Jun Rao
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/35/]
> {quote}org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for 
> /brokers/topics/__consumer_offsets/partitions/3/state at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:534) at 
> kafka.zk.KafkaZkClient.getTopicPartitionState(KafkaZkClient.scala:891) at 
> kafka.zk.KafkaZkClient.getLeaderForPartition(KafkaZkClient.scala:901) at 
> kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:669) 
> at kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:304) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:302) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:302) at 
> kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:350) at 
> kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:95) at 
> kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) at 
> kafka.api.AdminClientIntegrationTest.setUp(AdminClientIntegrationTest.scala:78)
>  at 
> kafka.api.SaslSslAdminClientIntegrationTest.setUp(SaslSslAdminClientIntegrationTest.scala:64){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8018) Flaky Test SaslSslAdminClientIntegrationTest#testLegacyAclOpsNeverAffectOrReturnPrefixed

2019-03-01 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao reassigned KAFKA-8018:
--

Assignee: Jun Rao

> Flaky Test 
> SaslSslAdminClientIntegrationTest#testLegacyAclOpsNeverAffectOrReturnPrefixed
> 
>
> Key: KAFKA-8018
> URL: https://issues.apache.org/jira/browse/KAFKA-8018
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Jun Rao
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/35/]
> {quote}org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for 
> /brokers/topics/__consumer_offsets/partitions/3/state at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:130) at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at 
> kafka.zookeeper.AsyncResponse.resultException(ZooKeeperClient.scala:534) at 
> kafka.zk.KafkaZkClient.getTopicPartitionState(KafkaZkClient.scala:891) at 
> kafka.zk.KafkaZkClient.getLeaderForPartition(KafkaZkClient.scala:901) at 
> kafka.utils.TestUtils$.waitUntilLeaderIsElectedOrChanged(TestUtils.scala:669) 
> at kafka.utils.TestUtils$.$anonfun$createTopic$1(TestUtils.scala:304) at 
> kafka.utils.TestUtils$.$anonfun$createTopic$1$adapted(TestUtils.scala:302) at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.Range.foreach(Range.scala:158) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
> kafka.utils.TestUtils$.createTopic(TestUtils.scala:302) at 
> kafka.utils.TestUtils$.createOffsetsTopic(TestUtils.scala:350) at 
> kafka.api.IntegrationTestHarness.doSetup(IntegrationTestHarness.scala:95) at 
> kafka.api.IntegrationTestHarness.setUp(IntegrationTestHarness.scala:73) at 
> kafka.api.AdminClientIntegrationTest.setUp(AdminClientIntegrationTest.scala:78)
>  at 
> kafka.api.SaslSslAdminClientIntegrationTest.setUp(SaslSslAdminClientIntegrationTest.scala:64){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7956) Avoid blocking in ShutdownableThread.awaitShutdown if the thread has not been started

2019-02-26 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-7956.

   Resolution: Fixed
 Assignee: Gardner Vickers
Fix Version/s: 2.3.0

Merged the PR to trunk.

> Avoid blocking in ShutdownableThread.awaitShutdown if the thread has not been 
> started
> -
>
> Key: KAFKA-7956
> URL: https://issues.apache.org/jira/browse/KAFKA-7956
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Gardner Vickers
>Assignee: Gardner Vickers
>Priority: Minor
> Fix For: 2.3.0
>
>
> Opening this Jira to track [https://github.com/apache/kafka/pull/6218], since 
> it's a rather subtle change.
> In some test cases it's desirable to instantiate a subclass of 
> `ShutdownableThread` without starting it. Since most subclasses of 
> `ShutdownableThread` put cleanup logic in `ShutdownableThread.shutdown()`, 
> being able to call `shutdown()` on the non-running thread would be useful.
> This change allows us to avoid blocking in `ShutdownableThread.shutdown()` if 
> the thread's `run()` method has not been called. We also add a check that 
> `initiateShutdown()` was called before `awaitShutdown()`, to protect against 
> the case where a user calls `awaitShutdown()` before the thread has been 
> started, and unexpectedly is not blocked on the thread shutting down. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7987) a broker's ZK session may die on transient auth failure

2019-02-22 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775565#comment-16775565
 ] 

Jun Rao commented on KAFKA-7987:


One potential way to fix this is to handle auth failure in ZooKeeperClient in 
the same way as session expiration by constantly retrying establishing the 
connection until success.

> a broker's ZK session may die on transient auth failure
> ---
>
> Key: KAFKA-7987
> URL: https://issues.apache.org/jira/browse/KAFKA-7987
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jun Rao
>Priority: Major
>
> After a transient network issue, we saw the following log in a broker.
> {code:java}
> [23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: 
> javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Server not found in Kerberos database (7))]) occurred when 
> evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client 
> will go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
> [23:37:02,102] ERROR [ZooKeeperClient] Auth failed. 
> (kafka.zookeeper.ZooKeeperClient)
> {code}
> The network issue prevented the broker from communicating to ZK. The broker's 
> ZK session then expired, but the broker didn't know that yet since it 
> couldn't establish a connection to ZK. When the network was back, the broker 
> tried to establish a connection to ZK, but failed due to auth failure (likely 
> due to a transient KDC issue). The current logic just ignores the auth 
> failure without trying to create a new ZK session. Then the broker will be 
> permanently in a state that it's alive, but not registered in ZK.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   6   >