[jira] [Commented] (KAFKA-1815) ServerShutdownTest fails in trunk.

2014-12-12 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244393#comment-14244393
 ] 

Chris Cope commented on KAFKA-1815:
---

I spoke too soon. I'm not getting different test results after running this a 
bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. 
Sometimes testMetricsLeak fails. Sometimes all 4 fail together.

 ServerShutdownTest fails in trunk.
 --

 Key: KAFKA-1815
 URL: https://issues.apache.org/jira/browse/KAFKA-1815
 Project: Kafka
  Issue Type: Bug
Reporter: Anatoly Fayngelerin
Priority: Minor
 Fix For: 0.8.3

 Attachments: shutdown_test_fix.patch


 I ran into these failures consistently when trying to build Kafka locally:
 kafka.server.ServerShutdownTest  testCleanShutdown FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101)
 kafka.server.ServerShutdownTest  testCleanShutdownWithDeleteTopicEnabled 
 FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114)
 kafka.server.ServerShutdownTest  testCleanShutdownAfterFailedStartup FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141)
 It looks like Jenkins also had issues with these tests:
 https://builds.apache.org/job/Kafka-trunk/351/console
 I would like to provide a patch that fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1815) ServerShutdownTest fails in trunk.

2014-12-12 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244393#comment-14244393
 ] 

Chris Cope edited comment on KAFKA-1815 at 12/12/14 4:20 PM:
-

I spoke too soon. I'm now getting different test results after running this a 
bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. 
Sometimes testMetricsLeak fails. Sometimes all 4 fail together.


was (Author: copester):
I spoke too soon. I'm not getting different test results after running this a 
bunch of times on our test farm. Sometimes the 3 tests on this ticket fail. 
Sometimes testMetricsLeak fails. Sometimes all 4 fail together.

 ServerShutdownTest fails in trunk.
 --

 Key: KAFKA-1815
 URL: https://issues.apache.org/jira/browse/KAFKA-1815
 Project: Kafka
  Issue Type: Bug
Reporter: Anatoly Fayngelerin
Priority: Minor
 Fix For: 0.8.3

 Attachments: shutdown_test_fix.patch


 I ran into these failures consistently when trying to build Kafka locally:
 kafka.server.ServerShutdownTest  testCleanShutdown FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101)
 kafka.server.ServerShutdownTest  testCleanShutdownWithDeleteTopicEnabled 
 FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114)
 kafka.server.ServerShutdownTest  testCleanShutdownAfterFailedStartup FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141)
 It looks like Jenkins also had issues with these tests:
 https://builds.apache.org/job/Kafka-trunk/351/console
 I would like to provide a patch that fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1815) ServerShutdownTest fails in trunk.

2014-12-12 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14245115#comment-14245115
 ] 

Chris Cope commented on KAFKA-1815:
---

Thanks, [~junrao], though that was actually [~fanatoly]'s patch. In terms of 
current state of tests passing since the latest commit 
523b36589e942cb99a95debd2c45e795ae533d08 for KAFKA-1813, and I'm seeing 
consistent passing of all the tests except for the occasional KAFKA-1501 
failures which continue to haunt me. Thanks!

 ServerShutdownTest fails in trunk.
 --

 Key: KAFKA-1815
 URL: https://issues.apache.org/jira/browse/KAFKA-1815
 Project: Kafka
  Issue Type: Bug
Reporter: Anatoly Fayngelerin
Assignee: Chris Cope
Priority: Minor
 Fix For: 0.8.3

 Attachments: shutdown_test_fix.patch


 I ran into these failures consistently when trying to build Kafka locally:
 kafka.server.ServerShutdownTest  testCleanShutdown FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdown(ServerShutdownTest.scala:101)
 kafka.server.ServerShutdownTest  testCleanShutdownWithDeleteTopicEnabled 
 FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownWithDeleteTopicEnabled(ServerShutdownTest.scala:114)
 kafka.server.ServerShutdownTest  testCleanShutdownAfterFailedStartup FAILED
 java.lang.NullPointerException
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest$$anonfun$verifyNonDaemonThreadsStatus$2.apply(ServerShutdownTest.scala:147)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:114)
 at 
 scala.collection.TraversableOnce$$anonfun$count$1.apply(TraversableOnce.scala:113)
 at 
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105)
 at 
 scala.collection.TraversableOnce$class.count(TraversableOnce.scala:113)
 at scala.collection.mutable.ArrayOps$ofRef.count(ArrayOps.scala:105)
 at 
 kafka.server.ServerShutdownTest.verifyNonDaemonThreadsStatus(ServerShutdownTest.scala:147)
 at 
 kafka.server.ServerShutdownTest.testCleanShutdownAfterFailedStartup(ServerShutdownTest.scala:141)
 It looks like Jenkins also had issues with these tests:
 https://builds.apache.org/job/Kafka-trunk/351/console
 I would like to provide a patch that fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.

2014-11-14 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212267#comment-14212267
 ] 

Chris Cope commented on KAFKA-1764:
---

This now builds and all 547 tests pass. Thanks!

 ZookeeperConsumerConnector could put multiple shutdownCommand to the same 
 data chunk queue.
 ---

 Key: KAFKA-1764
 URL: https://issues.apache.org/jira/browse/KAFKA-1764
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch, 
 KAFKA-1764_2014-11-13_23:57:51.patch


 In ZookeeperConsumerConnector shutdown(), we could potentially put multiple 
 shutdownCommand into the same data chunk queue, provided the topics are 
 sharing the same data chunk queue in topicThreadIdAndQueues.
 From email thread to document:
 In ZookeeperConsumerConnector shutdown(), we could potentially put
 multiple shutdownCommand into the same data chunk queue, provided the
 topics are sharing the same data chunk queue in topicThreadIdAndQueues.
 In our case, we only have 1 consumer stream for all the topics, the data
 chunk queue capacity is set to 1. The execution sequence causing problem is
 as below:
 1. ZookeeperConsumerConnector shutdown() is called, it tries to put
 shutdownCommand for each queue in topicThreadIdAndQueues. Since we only
 have 1 queue, multiple shutdownCommand will be put into the queue.
 2. In sendShutdownToAllQueues(), between queue.clean() and
 queue.put(shutdownCommand), consumer iterator receives the shutdownCommand
 and put it back into the data chunk queue. After that,
 ZookeeperConsumerConnector tries to put another shutdownCommand into the
 data chunk queue but will block forever.
 The thread stack trace is as below:
 {code}
 Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting
 on condition [0x7ff4f0124000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000680b96bf0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at
 kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259)
 at
 kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199)
 at
 kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192)
 - locked 0x000680dd5848 (a java.lang.Object)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.

2014-11-13 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211781#comment-14211781
 ] 

Chris Cope commented on KAFKA-1764:
---

Why do builds always break just as I put the kids to sleep and grab a glass of 
wine?

 ZookeeperConsumerConnector could put multiple shutdownCommand to the same 
 data chunk queue.
 ---

 Key: KAFKA-1764
 URL: https://issues.apache.org/jira/browse/KAFKA-1764
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch


 In ZookeeperConsumerConnector shutdown(), we could potentially put multiple 
 shutdownCommand into the same data chunk queue, provided the topics are 
 sharing the same data chunk queue in topicThreadIdAndQueues.
 From email thread to document:
 In ZookeeperConsumerConnector shutdown(), we could potentially put
 multiple shutdownCommand into the same data chunk queue, provided the
 topics are sharing the same data chunk queue in topicThreadIdAndQueues.
 In our case, we only have 1 consumer stream for all the topics, the data
 chunk queue capacity is set to 1. The execution sequence causing problem is
 as below:
 1. ZookeeperConsumerConnector shutdown() is called, it tries to put
 shutdownCommand for each queue in topicThreadIdAndQueues. Since we only
 have 1 queue, multiple shutdownCommand will be put into the queue.
 2. In sendShutdownToAllQueues(), between queue.clean() and
 queue.put(shutdownCommand), consumer iterator receives the shutdownCommand
 and put it back into the data chunk queue. After that,
 ZookeeperConsumerConnector tries to put another shutdownCommand into the
 data chunk queue but will block forever.
 The thread stack trace is as below:
 {code}
 Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting
 on condition [0x7ff4f0124000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000680b96bf0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at
 kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259)
 at
 kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199)
 at
 kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192)
 - locked 0x000680dd5848 (a java.lang.Object)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1764) ZookeeperConsumerConnector could put multiple shutdownCommand to the same data chunk queue.

2014-11-13 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211782#comment-14211782
 ] 

Chris Cope commented on KAFKA-1764:
---

{code}
/home/bamboo/bamboo-agent-home/xml-data/build-dir/STREAM-KAFKA-JOB1/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala:259:
 missing parameter type
for (queue - topicThreadIdAndQueues.values.toSet) {
 ^
one error found
{code}

 ZookeeperConsumerConnector could put multiple shutdownCommand to the same 
 data chunk queue.
 ---

 Key: KAFKA-1764
 URL: https://issues.apache.org/jira/browse/KAFKA-1764
 Project: Kafka
  Issue Type: Bug
Reporter: Jiangjie Qin
Assignee: Jiangjie Qin
 Attachments: KAFKA-1764.patch, KAFKA-1764_2014-11-12_14:05:35.patch


 In ZookeeperConsumerConnector shutdown(), we could potentially put multiple 
 shutdownCommand into the same data chunk queue, provided the topics are 
 sharing the same data chunk queue in topicThreadIdAndQueues.
 From email thread to document:
 In ZookeeperConsumerConnector shutdown(), we could potentially put
 multiple shutdownCommand into the same data chunk queue, provided the
 topics are sharing the same data chunk queue in topicThreadIdAndQueues.
 In our case, we only have 1 consumer stream for all the topics, the data
 chunk queue capacity is set to 1. The execution sequence causing problem is
 as below:
 1. ZookeeperConsumerConnector shutdown() is called, it tries to put
 shutdownCommand for each queue in topicThreadIdAndQueues. Since we only
 have 1 queue, multiple shutdownCommand will be put into the queue.
 2. In sendShutdownToAllQueues(), between queue.clean() and
 queue.put(shutdownCommand), consumer iterator receives the shutdownCommand
 and put it back into the data chunk queue. After that,
 ZookeeperConsumerConnector tries to put another shutdownCommand into the
 data chunk queue but will block forever.
 The thread stack trace is as below:
 {code}
 Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting
 on condition [0x7ff4f0124000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000680b96bf0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262)
 at
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at
 kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259)
 at
 kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199)
 at
 kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192)
 - locked 0x000680dd5848 (a java.lang.Object)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2014-10-28 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186564#comment-14186564
 ] 

Chris Cope commented on KAFKA-1501:
---

14/200 test runs failed with the java.net.BindException: 
java.net.BindException: Address already in use errors. This bug is rough.

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie
 Attachments: KAFKA-1501.patch, KAFKA-1501.patch


 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2014-10-23 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181928#comment-14181928
 ] 

Chris Cope commented on KAFKA-1501:
---

Awesome! I should know shortly,  spinning off 100 jobs via phone...

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie
 Attachments: KAFKA-1501.patch


 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2014-10-23 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182233#comment-14182233
 ] 

Chris Cope commented on KAFKA-1501:
---

Unfortunately same issues, 9/100 tests had a bunch of failures

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
Assignee: Guozhang Wang
  Labels: newbie
 Attachments: KAFKA-1501.patch


 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution

2014-09-23 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145006#comment-14145006
 ] 

Chris Cope commented on KAFKA-1490:
---

Ok so the wrapper.gradle was in the attached patch on this issue, just not in 
the github commit. However, I there may be another change that didn't make it 
in to the commit?
{noformat}
ubuntu@ip-10-183-61-60:~/kafka$ gradle

FAILURE: Build failed with an exception.

* Where:
Script '/home/ubuntu/kafka/gradle/license.gradle' line: 2

* What went wrong:
A problem occurred evaluating script.
 Could not find method create() for arguments [downloadLicenses, class 
 nl.javadude.gradle.plugins.license.DownloadLicenses] on task set.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 3.303 secs
{noformat}

 remove gradlew initial setup output from source distribution
 

 Key: KAFKA-1490
 URL: https://issues.apache.org/jira/browse/KAFKA-1490
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1490-2.patch, KAFKA-1490.patch, rb25703.patch


 Our current source releases contains lots of stuff in the gradle folder we do 
 not need



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1490) remove gradlew initial setup output from source distribution

2014-09-23 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145072#comment-14145072
 ] 

Chris Cope commented on KAFKA-1490:
---

I'm still getting that DownloadLicenses error. By default it's building with 
Scala 2.9.1, but trying Scala 2.10.1 like you also gets the same error. Do you 
mind listing your dependency versions? There may be something out of date on my 
end.

 remove gradlew initial setup output from source distribution
 

 Key: KAFKA-1490
 URL: https://issues.apache.org/jira/browse/KAFKA-1490
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Ivan Lyutov
Priority: Blocker
 Fix For: 0.8.2

 Attachments: KAFKA-1490-2.patch, KAFKA-1490.patch, rb25703.patch


 Our current source releases contains lots of stuff in the gradle folder we do 
 not need



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2014-09-10 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128457#comment-14128457
 ] 

Chris Cope commented on KAFKA-1501:
---

[~abhioncbr],
{code}
git clone https://github.com/apache/kafka.git .
./gradlew test
{code}
There will be failures anywhere from 10%-20% of the time. I *think* there is a 
race condition with TestUtils.choosePorts(), where a port is grabbed, closed, 
and then when the KafkaTestHarness uses it, it's not available yet. Looking 
through failures of the last few hundred test runs I've done, there is usually 
one 1 (occasionally 2) ports at fault that then cause subsequent tests to fail 
for the test class. 

Essentially, this race condition is occurring approximately 0.06% of the time a 
socket server is created. However, my team frequently has to rerun tests after 
a code change, because sometimes it fails. This is seen at least multiple times 
a day by us. The best solution seems to catch this exception and then grab new 
ports. Again we're talking about the test harness, so which ports it runs on 
doesn't matter. Thoughts?


 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
  Labels: newbie

 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1501) transient unit tests failures due to port already in use

2014-09-10 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128653#comment-14128653
 ] 

Chris Cope edited comment on KAFKA-1501 at 9/10/14 4:10 PM:


I agree, [~absingh]. I'm running some more tests and I think the best way to 
handle this unlikely event is to catch it specifically, and then have it rerun 
the entire test class *one* time, and noting this in the test log. This bug 
does not affect the core Kafka code, and is simply exposed here because Kafka 
has such great unit tests, and we just happen to run them A LOT for our 
purposes. I'm proposing this solution instead of hunting and fixing the 
underlying issue in choosePorts(), which when looking around at other projects 
does seem like a decent implementation.

The probability of a test class failing twice in a row should be very low 
(0.0001%) and should result in any test class failure less than 1% of the time 
`./gradlew test` is run.

Is this approach sound?



was (Author: copester):
I agree, [~absingh]. I'm running some more tests and I think the best way to 
handle this unlikely event is to catch is specifically, and then have it rerun 
the entire test class *one* time, and noting this in the test log. This bug 
does not affect the core Kafka code, and is simple exposed here because Kafka 
has such great unit tests, and we just happen to run them A LOT of our 
purposes. I'm proposing this solution instead of hunting and fixing the 
underlying issue in choosePorts(), which when looking around at other projects 
does seem like a decent implementation.

The probability of a test class failing twice in a row should be very low 
(0.0001%) and should result in any test class failure less than 1% of the time 
`./gradlew test` is run.

Is this approach sound?


 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
  Labels: newbie

 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2014-09-09 Thread Chris Cope (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127343#comment-14127343
 ] 

Chris Cope commented on KAFKA-1501:
---

Ugh, this bug was obnoxious! This has bit us a enough times that we had to fix 
it. To isolate, I ran the full set of tests on our test farm 100x for trunk and 
100x for 0.8.1.
* _trunk_ failed 11/100 times
* _0.8.1_ failed 12/100 times
It's a race condition. The fix is for ZooKeeperTestHarness but I need to rebase 
and retest it. Also, I think the failure rate may be related to the underlying 
hardware (faster processing = more likely to hit the race condition). I should 
have a fix that has been tested with the latest trunk tonight.

 transient unit tests failures due to port already in use
 

 Key: KAFKA-1501
 URL: https://issues.apache.org/jira/browse/KAFKA-1501
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jun Rao
  Labels: newbie

 Saw the following transient failures.
 kafka.api.ProducerFailureHandlingTest  testTooLargeRecordWithAckOne FAILED
 kafka.common.KafkaException: Socket server failed to bind to 
 localhost:59909: Address already in use.
 at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
 at kafka.network.Acceptor.init(SocketServer.scala:141)
 at kafka.network.SocketServer.startup(SocketServer.scala:68)
 at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
 at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
 at 
 kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)