[jira] [Updated] (KAFKA-589) Clean shutdown after startup connection failure

2014-09-25 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-589:

Status: Patch Available  (was: Open)

This patch makes KafkaServer clean up after itself, but still rethrow any 
caught exceptions. This keeps the existing interface the same, should still 
work if the caller does cleanup themselves by catching exceptions and calling 
shutdown, but also cleans up if they don't so the leftover thread won't cause a 
hang. Also adds a test of this behavior.

 Clean shutdown after startup connection failure
 ---

 Key: KAFKA-589
 URL: https://issues.apache.org/jira/browse/KAFKA-589
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.7.2
Reporter: Jason Rosenberg
Assignee: Swapnil Ghike
Priority: Minor
  Labels: bugs, newbie

 Hi,
 I'm embedding the kafka server (0.7.2) in an application container.   I've 
 noticed that if I try to start the server without zookeeper being available, 
 by default it gets a zk connection timeout after 6 seconds, and then throws 
 an Exception out of KafkaServer.startup()E.g., I see this stack trace:
 Exception in thread main org.I0Itec.zkclient.exception.ZkTimeoutException: 
 Unable to connect to zookeeper server within timeout: 6000
   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84)
   at kafka.server.KafkaZooKeeper.startup(KafkaZooKeeper.scala:44)
   at kafka.log.LogManager.init(LogManager.scala:93)
   at kafka.server.KafkaServer.startup(KafkaServer.scala:58)
 
 
 So that's ok, I can catch the exception, and then shut everything down 
 gracefully, in this case.  However, when I do this, it seems there is a 
 daemon thread still around, which doesn't quit, and so the server never 
 actually exits the jvm.  Specifically, this thread seems to hang around:
 kafka-logcleaner-0 prio=5 tid=7fd9b48b1000 nid=0x112c08000 waiting on 
 condition [112c07000]
java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  7f40d4be8 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
   at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:680)
 Looking at the code in kafka.log.LogManager(), it does seem like it starts up 
 the scheduler to clean logs, before then trying to connect to zk (and in this 
 case fail):
   /* Schedule the cleanup task to delete old logs */
   if(scheduler != null) {
 info(starting log cleaner every  + logCleanupIntervalMs +  ms)
 scheduler.scheduleWithRate(cleanupLogs, 60 * 1000, logCleanupIntervalMs)
   }
 So this scheduler does not appear to be stopped if startup fails.  However, 
 if I catch the above RuntimeException, and then call KafkaServer.shutdown(), 
 then it will stop the scheduler, and all is good.
 However, it seems odd that if I get an exception when calling 
 KafkaServer.startup(), that I should still have to do a 
 KafkaServer.shutdown().  Rather, wouldn't it be better to have it internally 
 cleanup after itself if startup() gets an exception?  I'm not sure I can 
 reliably call shutdown() after a failed startup()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-589) Clean shutdown after startup connection failure

2014-09-25 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-589:

Attachment: KAFKA-589-v1.patch

 Clean shutdown after startup connection failure
 ---

 Key: KAFKA-589
 URL: https://issues.apache.org/jira/browse/KAFKA-589
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7.2, 0.8.0
Reporter: Jason Rosenberg
Assignee: Swapnil Ghike
Priority: Minor
  Labels: bugs, newbie
 Attachments: KAFKA-589-v1.patch


 Hi,
 I'm embedding the kafka server (0.7.2) in an application container.   I've 
 noticed that if I try to start the server without zookeeper being available, 
 by default it gets a zk connection timeout after 6 seconds, and then throws 
 an Exception out of KafkaServer.startup()E.g., I see this stack trace:
 Exception in thread main org.I0Itec.zkclient.exception.ZkTimeoutException: 
 Unable to connect to zookeeper server within timeout: 6000
   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84)
   at kafka.server.KafkaZooKeeper.startup(KafkaZooKeeper.scala:44)
   at kafka.log.LogManager.init(LogManager.scala:93)
   at kafka.server.KafkaServer.startup(KafkaServer.scala:58)
 
 
 So that's ok, I can catch the exception, and then shut everything down 
 gracefully, in this case.  However, when I do this, it seems there is a 
 daemon thread still around, which doesn't quit, and so the server never 
 actually exits the jvm.  Specifically, this thread seems to hang around:
 kafka-logcleaner-0 prio=5 tid=7fd9b48b1000 nid=0x112c08000 waiting on 
 condition [112c07000]
java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  7f40d4be8 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
   at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:680)
 Looking at the code in kafka.log.LogManager(), it does seem like it starts up 
 the scheduler to clean logs, before then trying to connect to zk (and in this 
 case fail):
   /* Schedule the cleanup task to delete old logs */
   if(scheduler != null) {
 info(starting log cleaner every  + logCleanupIntervalMs +  ms)
 scheduler.scheduleWithRate(cleanupLogs, 60 * 1000, logCleanupIntervalMs)
   }
 So this scheduler does not appear to be stopped if startup fails.  However, 
 if I catch the above RuntimeException, and then call KafkaServer.shutdown(), 
 then it will stop the scheduler, and all is good.
 However, it seems odd that if I get an exception when calling 
 KafkaServer.startup(), that I should still have to do a 
 KafkaServer.shutdown().  Rather, wouldn't it be better to have it internally 
 cleanup after itself if startup() gets an exception?  I'm not sure I can 
 reliably call shutdown() after a failed startup()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-589) Clean shutdown after startup connection failure

2014-09-25 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-589:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to trunk

 Clean shutdown after startup connection failure
 ---

 Key: KAFKA-589
 URL: https://issues.apache.org/jira/browse/KAFKA-589
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7.2, 0.8.0
Reporter: Jason Rosenberg
Assignee: Swapnil Ghike
Priority: Minor
  Labels: bugs, newbie
 Attachments: KAFKA-589-v1.patch


 Hi,
 I'm embedding the kafka server (0.7.2) in an application container.   I've 
 noticed that if I try to start the server without zookeeper being available, 
 by default it gets a zk connection timeout after 6 seconds, and then throws 
 an Exception out of KafkaServer.startup()E.g., I see this stack trace:
 Exception in thread main org.I0Itec.zkclient.exception.ZkTimeoutException: 
 Unable to connect to zookeeper server within timeout: 6000
   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84)
   at kafka.server.KafkaZooKeeper.startup(KafkaZooKeeper.scala:44)
   at kafka.log.LogManager.init(LogManager.scala:93)
   at kafka.server.KafkaServer.startup(KafkaServer.scala:58)
 
 
 So that's ok, I can catch the exception, and then shut everything down 
 gracefully, in this case.  However, when I do this, it seems there is a 
 daemon thread still around, which doesn't quit, and so the server never 
 actually exits the jvm.  Specifically, this thread seems to hang around:
 kafka-logcleaner-0 prio=5 tid=7fd9b48b1000 nid=0x112c08000 waiting on 
 condition [112c07000]
java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  7f40d4be8 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
   at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:680)
 Looking at the code in kafka.log.LogManager(), it does seem like it starts up 
 the scheduler to clean logs, before then trying to connect to zk (and in this 
 case fail):
   /* Schedule the cleanup task to delete old logs */
   if(scheduler != null) {
 info(starting log cleaner every  + logCleanupIntervalMs +  ms)
 scheduler.scheduleWithRate(cleanupLogs, 60 * 1000, logCleanupIntervalMs)
   }
 So this scheduler does not appear to be stopped if startup fails.  However, 
 if I catch the above RuntimeException, and then call KafkaServer.shutdown(), 
 then it will stop the scheduler, and all is good.
 However, it seems odd that if I get an exception when calling 
 KafkaServer.startup(), that I should still have to do a 
 KafkaServer.shutdown().  Rather, wouldn't it be better to have it internally 
 cleanup after itself if startup() gets an exception?  I'm not sure I can 
 reliably call shutdown() after a failed startup()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-589) Clean shutdown after startup connection failure

2014-09-25 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-589:

Assignee: Ewen Cheslack-Postava  (was: Swapnil Ghike)

 Clean shutdown after startup connection failure
 ---

 Key: KAFKA-589
 URL: https://issues.apache.org/jira/browse/KAFKA-589
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.7.2, 0.8.0
Reporter: Jason Rosenberg
Assignee: Ewen Cheslack-Postava
Priority: Minor
  Labels: bugs, newbie
 Attachments: KAFKA-589-v1.patch


 Hi,
 I'm embedding the kafka server (0.7.2) in an application container.   I've 
 noticed that if I try to start the server without zookeeper being available, 
 by default it gets a zk connection timeout after 6 seconds, and then throws 
 an Exception out of KafkaServer.startup()E.g., I see this stack trace:
 Exception in thread main org.I0Itec.zkclient.exception.ZkTimeoutException: 
 Unable to connect to zookeeper server within timeout: 6000
   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:876)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:98)
   at org.I0Itec.zkclient.ZkClient.init(ZkClient.java:84)
   at kafka.server.KafkaZooKeeper.startup(KafkaZooKeeper.scala:44)
   at kafka.log.LogManager.init(LogManager.scala:93)
   at kafka.server.KafkaServer.startup(KafkaServer.scala:58)
 
 
 So that's ok, I can catch the exception, and then shut everything down 
 gracefully, in this case.  However, when I do this, it seems there is a 
 daemon thread still around, which doesn't quit, and so the server never 
 actually exits the jvm.  Specifically, this thread seems to hang around:
 kafka-logcleaner-0 prio=5 tid=7fd9b48b1000 nid=0x112c08000 waiting on 
 condition [112c07000]
java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  7f40d4be8 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at 
 java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
   at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:680)
 Looking at the code in kafka.log.LogManager(), it does seem like it starts up 
 the scheduler to clean logs, before then trying to connect to zk (and in this 
 case fail):
   /* Schedule the cleanup task to delete old logs */
   if(scheduler != null) {
 info(starting log cleaner every  + logCleanupIntervalMs +  ms)
 scheduler.scheduleWithRate(cleanupLogs, 60 * 1000, logCleanupIntervalMs)
   }
 So this scheduler does not appear to be stopped if startup fails.  However, 
 if I catch the above RuntimeException, and then call KafkaServer.shutdown(), 
 then it will stop the scheduler, and all is good.
 However, it seems odd that if I get an exception when calling 
 KafkaServer.startup(), that I should still have to do a 
 KafkaServer.shutdown().  Rather, wouldn't it be better to have it internally 
 cleanup after itself if startup() gets an exception?  I'm not sure I can 
 reliably call shutdown() after a failed startup()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)