[jira] [Commented] (KAFKA-1815) ServerShutdownTest fails in trunk.
[ https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244393#comment-14244393 ] Chris Cope commented on KAFKA-1815: --- I spoke too soon. I'm not getting different test results after running this a bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. Sometimes testMetricsLeak fails. Sometimes all 4 fail together. ServerShutdownTest fails in trunk. -- Key: KAFKA-1815 URL: https://issues.apache.org/jira/browse/KAFKA-1815 Project: Kafka Issue Type: Bug Reporter: Anatoly Fayngelerin Priority: Minor Fix For: 0.8.3 Attachments: shutdown_test_fix.patch I ran into these failures consistently when trying to build Kafka locally: kafka.server.ServerShutdownTest testCleanShutdown FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101) kafka.server.ServerShutdownTest testCleanShutdownWithDeleteTopicEnabled FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114) kafka.server.ServerShutdownTest testCleanShutdownAfterFailedStartup FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141) It looks like Jenkins also had issues with these tests: https://builds.apache.org/job/Kafka-trunk/351/console I would like to provide a patch that fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1815) ServerShutdownTest fails in trunk.
[ https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244393#comment-14244393 ] Chris Cope edited comment on KAFKA-1815 at 12/12/14 4:20 PM: - I spoke too soon. I'm now getting different test results after running this a bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. Sometimes testMetricsLeak fails. Sometimes all 4 fail together. was (Author: copester): I spoke too soon. I'm not getting different test results after running this a bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. Sometimes testMetricsLeak fails. Sometimes all 4 fail together. ServerShutdownTest fails in trunk. -- Key: KAFKA-1815 URL: https://issues.apache.org/jira/browse/KAFKA-1815 Project: Kafka Issue Type: Bug Reporter: Anatoly Fayngelerin Priority: Minor Fix For: 0.8.3 Attachments: shutdown_test_fix.patch I ran into these failures consistently when trying to build Kafka locally: kafka.server.ServerShutdownTest testCleanShutdown FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101) kafka.server.ServerShutdownTest testCleanShutdownWithDeleteTopicEnabled FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114) kafka.server.ServerShutdownTest testCleanShutdownAfterFailedStartup FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141) It looks like Jenkins also had issues with these tests: https://builds.apache.org/job/Kafka-trunk/351/console I would like to provide a patch that fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1815) ServerShutdownTest fails in trunk.
[ https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245115#comment-14245115 ] Chris Cope commented on KAFKA-1815: --- Thanks, [~junrao], though that was actually [~fanatoly]'s patch. In terms of current state of tests passing since the latest commit 523b36589e942cb99a95debd2c45e795ae533d08 for KAFKA-1813, and I'm seeing consistent passing of all the tests except for the occasional KAFKA-1501 failures which continue to haunt me. Thanks! ServerShutdownTest fails in trunk. -- Key: KAFKA-1815 URL: https://issues.apache.org/jira/browse/KAFKA-1815 Project: Kafka Issue Type: Bug Reporter: Anatoly Fayngelerin Assignee: Chris Cope Priority: Minor Fix For: 0.8.3 Attachments: shutdown_test_fix.patch I ran into these failures consistently when trying to build Kafka locally: kafka.server.ServerShutdownTest testCleanShutdown FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101) kafka.server.ServerShutdownTest testCleanShutdownWithDeleteTopicEnabled FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114) kafka.server.ServerShutdownTest testCleanShutdownAfterFailedStartup FAILED java.lang.NullPointerException at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114) at scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113) at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105) at kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147) at kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141) It looks like Jenkins also had issues with these tests: https://builds.apache.org/job/Kafka-trunk/351/console I would like to provide a patch that fixes this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.
[ https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212267#comment-14212267 ] Chris Cope commented on KAFKA-1764: --- This now builds and all 547 tests pass. Thanks! ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue. --- Key: KAFKA-1764 URL: https://issues.apache.org/jira/browse/KAFKA-1764 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch, KAFKA-1764_2014-11-13_23:57:51.patch In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. From email thread to document: In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. In our case, we only have 1 consumer stream for all the topics, the data chunk queue capacity is set to 1. The execution sequence causing problem is as below: 1. ZookeeperConsumerConnector shutdown() is called, it tries to put shutdownCommand for each queue in topicThreadIdAndQueues. Since we only have 1 queue, multiple shutdownCommand will be put into the queue. 2. In sendShutdownToAllQueues(), between queue.clean() and queue.put(shutdownCommand), consumer iterator receives the shutdownCommand and put it back into the data chunk queue. After that, ZookeeperConsumerConnector tries to put another shutdownCommand into the data chunk queue but will block forever. The thread stack trace is as below: {code} Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting on condition [0x7ff4f0124000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000680b96bf0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259) at kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199) at kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192) - locked 0x000680dd5848 (a java.lang.Object) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at scala.collection.immutable.List.foreach(List.scala:318) at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.
[ https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211781#comment-14211781 ] Chris Cope commented on KAFKA-1764: --- Why do builds always break just as I put the kids to sleep and grab a glass of wine? ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue. --- Key: KAFKA-1764 URL: https://issues.apache.org/jira/browse/KAFKA-1764 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. From email thread to document: In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. In our case, we only have 1 consumer stream for all the topics, the data chunk queue capacity is set to 1. The execution sequence causing problem is as below: 1. ZookeeperConsumerConnector shutdown() is called, it tries to put shutdownCommand for each queue in topicThreadIdAndQueues. Since we only have 1 queue, multiple shutdownCommand will be put into the queue. 2. In sendShutdownToAllQueues(), between queue.clean() and queue.put(shutdownCommand), consumer iterator receives the shutdownCommand and put it back into the data chunk queue. After that, ZookeeperConsumerConnector tries to put another shutdownCommand into the data chunk queue but will block forever. The thread stack trace is as below: {code} Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting on condition [0x7ff4f0124000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000680b96bf0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259) at kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199) at kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192) - locked 0x000680dd5848 (a java.lang.Object) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at scala.collection.immutable.List.foreach(List.scala:318) at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.
[ https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211782#comment-14211782 ] Chris Cope commented on KAFKA-1764: --- {code} /home/bamboo/bamboo-agent-home/xml-data/build-dir/STREAM-KAFKA-JOB1/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala:259: missing parameter type for (queue - topicThreadIdAndQueues.values.toSet) { ^ one error found {code} ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue. --- Key: KAFKA-1764 URL: https://issues.apache.org/jira/browse/KAFKA-1764 Project: Kafka Issue Type: Bug Reporter: Jiangjie Qin Assignee: Jiangjie Qin Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. From email thread to document: In ZookeeperConsumerConnector shutdown(), we could potentially put multiple shutdownCommand into the same data chunk queue, provided the topics are sharing the same data chunk queue in topicThreadIdAndQueues. In our case, we only have 1 consumer stream for all the topics, the data chunk queue capacity is set to 1. The execution sequence causing problem is as below: 1. ZookeeperConsumerConnector shutdown() is called, it tries to put shutdownCommand for each queue in topicThreadIdAndQueues. Since we only have 1 queue, multiple shutdownCommand will be put into the queue. 2. In sendShutdownToAllQueues(), between queue.clean() and queue.put(shutdownCommand), consumer iterator receives the shutdownCommand and put it back into the data chunk queue. After that, ZookeeperConsumerConnector tries to put another shutdownCommand into the data chunk queue but will block forever. The thread stack trace is as below: {code} Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting on condition [0x7ff4f0124000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000680b96bf0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262) at kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259) at kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199) at kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192) - locked 0x000680dd5848 (a java.lang.Object) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185) at scala.collection.immutable.List.foreach(List.scala:318) at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186564#comment-14186564 ] Chris Cope commented on KAFKA-1501: --- 14/200 test runs failed with the java.net.BindException: java.net.BindException: Address already in use errors. This bug is rough. transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie Attachments: KAFKA-1501.patch, KAFKA-1501.patch Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181928#comment-14181928 ] Chris Cope commented on KAFKA-1501: --- Awesome! I should know shortly, spinning off 100 jobs via phone... transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie Attachments: KAFKA-1501.patch Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182233#comment-14182233 ] Chris Cope commented on KAFKA-1501: --- Unfortunately same issues, 9/100 tests had a bunch of failures transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Assignee: Guozhang Wang Labels: newbie Attachments: KAFKA-1501.patch Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution
[ https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145006#comment-14145006 ] Chris Cope commented on KAFKA-1490: --- Ok so the wrapper.gradle was in the attached patch on this issue, just not in the github commit. However, I there may be another change that didn't make it in to the commit? {noformat} ubuntu@ip-10-183-61-60:~/kafka$ gradle FAILURE: Build failed with an exception. * Where: Script '/home/ubuntu/kafka/gradle/license.gradle' line: 2 * What went wrong: A problem occurred evaluating script. Could not find method create() for arguments [downloadLicenses, class nl.javadude.gradle.plugins.license.DownloadLicenses] on task set. * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 3.303 secs {noformat} remove gradlew initial setup output from source distribution Key: KAFKA-1490 URL: https://issues.apache.org/jira/browse/KAFKA-1490 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1490-2.patch, KAFKA-1490.patch, rb25703.patch Our current source releases contains lots of stuff in the gradle folder we do not need -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution
[ https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145072#comment-14145072 ] Chris Cope commented on KAFKA-1490: --- I'm still getting that DownloadLicenses error. By default it's building with Scala 2.9.1, but trying Scala 2.10.1 like you also gets the same error. Do you mind listing your dependency versions? There may be something out of date on my end. remove gradlew initial setup output from source distribution Key: KAFKA-1490 URL: https://issues.apache.org/jira/browse/KAFKA-1490 Project: Kafka Issue Type: Bug Reporter: Joe Stein Assignee: Ivan Lyutov Priority: Blocker Fix For: 0.8.2 Attachments: KAFKA-1490-2.patch, KAFKA-1490.patch, rb25703.patch Our current source releases contains lots of stuff in the gradle folder we do not need -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128457#comment-14128457 ] Chris Cope commented on KAFKA-1501: --- [~abhioncbr], {code} git clone https://github.com/apache/kafka.git . ./gradlew test {code} There will be failures anywhere from 10%-20% of the time. I *think* there is a race condition with TestUtils.choosePorts(), where a port is grabbed, closed, and then when the KafkaTestHarness uses it, it's not available yet. Looking through failures of the last few hundred test runs I've done, there is usually one 1 (occasionally 2) ports at fault that then cause subsequent tests to fail for the test class. Essentially, this race condition is occurring approximately 0.06% of the time a socket server is created. However, my team frequently has to rerun tests after a code change, because sometimes it fails. This is seen at least multiple times a day by us. The best solution seems to catch this exception and then grab new ports. Again we're talking about the test harness, so which ports it runs on doesn't matter. Thoughts? transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Labels: newbie Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128653#comment-14128653 ] Chris Cope edited comment on KAFKA-1501 at 9/10/14 4:10 PM: I agree, [~absingh]. I'm running some more tests and I think the best way to handle this unlikely event is to catch it specifically, and then have it rerun the entire test class *one* time, and noting this in the test log. This bug does not affect the core Kafka code, and is simply exposed here because Kafka has such great unit tests, and we just happen to run them A LOT for our purposes. I'm proposing this solution instead of hunting and fixing the underlying issue in choosePorts(), which when looking around at other projects does seem like a decent implementation. The probability of a test class failing twice in a row should be very low (0.0001%) and should result in any test class failure less than 1% of the time `./gradlew test` is run. Is this approach sound? was (Author: copester): I agree, [~absingh]. I'm running some more tests and I think the best way to handle this unlikely event is to catch is specifically, and then have it rerun the entire test class *one* time, and noting this in the test log. This bug does not affect the core Kafka code, and is simple exposed here because Kafka has such great unit tests, and we just happen to run them A LOT of our purposes. I'm proposing this solution instead of hunting and fixing the underlying issue in choosePorts(), which when looking around at other projects does seem like a decent implementation. The probability of a test class failing twice in a row should be very low (0.0001%) and should result in any test class failure less than 1% of the time `./gradlew test` is run. Is this approach sound? transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Labels: newbie Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use
[ https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127343#comment-14127343 ] Chris Cope commented on KAFKA-1501: --- Ugh, this bug was obnoxious! This has bit us a enough times that we had to fix it. To isolate, I ran the full set of tests on our test farm 100x for trunk and 100x for 0.8.1. * _trunk_ failed 11/100 times * _0.8.1_ failed 12/100 times It's a race condition. The fix is for ZooKeeperTestHarness but I need to rebase and retest it. Also, I think the failure rate may be related to the underlying hardware (faster processing = more likely to hit the race condition). I should have a fix that has been tested with the latest trunk tonight. transient unit tests failures due to port already in use Key: KAFKA-1501 URL: https://issues.apache.org/jira/browse/KAFKA-1501 Project: Kafka Issue Type: Improvement Components: core Reporter: Jun Rao Labels: newbie Saw the following transient failures. kafka.api.ProducerFailureHandlingTest testTooLargeRecordWithAckOne FAILED kafka.common.KafkaException: Socket server failed to bind to localhost:59909: Address already in use. at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195) at kafka.network.Acceptor.init(SocketServer.scala:141) at kafka.network.SocketServer.startup(SocketServer.scala:68) at kafka.server.KafkaServer.startup(KafkaServer.scala:95) at kafka.utils.TestUtils$.createServer(TestUtils.scala:123) at kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68) -- This message was sent by Atlassian JIRA (v6.3.4#6332)