[jira] [Commented] (HDDS-1830) OzoneManagerDoubleBuffer#stop should wait for daemon thread to die
[ https://issues.apache.org/jira/browse/HDDS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893218#comment-16893218 ] Hudson commented on HDDS-1830: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16985 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16985/]) HDDS-1830 OzoneManagerDoubleBuffer#stop should wait for daemon thread to (arp7: rev b7fba78fb63a0971835db87292822fd8cd4aa7ad) * (edit) hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerDoubleBuffer.java > OzoneManagerDoubleBuffer#stop should wait for daemon thread to die > -- > > Key: HDDS-1830 > URL: https://issues.apache.org/jira/browse/HDDS-1830 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Based on [~arp]'s comment on HDDS-1649, OzoneManagerDoubleBuffer#stop() calls > interrupt() on daemon thread but not join(). The thread might still be > running when the call returns. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1830) OzoneManagerDoubleBuffer#stop should wait for daemon thread to die
[ https://issues.apache.org/jira/browse/HDDS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892225#comment-16892225 ] Siyao Meng commented on HDDS-1830: -- Thanks [~bharatviswa]. In a short discussion with [~arp] we further decided to make isRunning atomic (though volatile I theory should already be fine but it shouldn't hurt to use to atomic). Just posted a PR. I'm not entirely sure about using try-catch bracket in stop() though. But I think it is too much hassle to put throws InterruptedException in every parent calls (and someone might want to catch it eventually). I ran TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer locally. It is no longer stuck. > OzoneManagerDoubleBuffer#stop should wait for daemon thread to die > -- > > Key: HDDS-1830 > URL: https://issues.apache.org/jira/browse/HDDS-1830 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Based on [~arp]'s comment on HDDS-1649, OzoneManagerDoubleBuffer#stop() calls > interrupt() on daemon thread but not join(). The thread might still be > running when the call returns. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1830) OzoneManagerDoubleBuffer#stop should wait for daemon thread to die
[ https://issues.apache.org/jira/browse/HDDS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890605#comment-16890605 ] Bharat Viswanadham commented on HDDS-1830: -- Hi [~smeng] Thanks for jstack. I have understood the root cause for this issue. We did interrupt, but to come out of wait(), we need to get lock, as stop() is a synchronize method, even if we interrupt, the other thread will not be able to come out of that. As the stop() has acquired the lock(as this is also synchronized method). I think a simple solution is just to remove synchronize from the stop method. I have verified that it is working. > OzoneManagerDoubleBuffer#stop should wait for daemon thread to die > -- > > Key: HDDS-1830 > URL: https://issues.apache.org/jira/browse/HDDS-1830 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Siyao Meng >Priority: Major > > Based on [~arp]'s comment on HDDS-1649, OzoneManagerDoubleBuffer#stop() calls > interrupt() on daemon thread but not join(). The thread might still be > running when the call returns. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1830) OzoneManagerDoubleBuffer#stop should wait for daemon thread to die
[ https://issues.apache.org/jira/browse/HDDS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890515#comment-16890515 ] Siyao Meng commented on HDDS-1830: -- I tested that if I call join() right after interrupt() in OzoneManagerDoubleBuffer#stop(), the thread would wait indefinitely: {code:title=jstack} "Thread-2" #14 prio=5 os_prio=31 tid=0x7fee3e997800 nid=0x6003 in Object.wait() [0x768fe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00079591b858> (a org.apache.hadoop.util.Daemon) at java.lang.Thread.join(Thread.java:1252) - locked <0x00079591b858> (a org.apache.hadoop.util.Daemon) at java.lang.Thread.join(Thread.java:1326) at org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.stop(OzoneManagerDoubleBuffer.java:208) - locked <0x000795914f98> (a org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer) at org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.stop(TestOzoneManagerDoubleBufferWithOMResponse.java:90) at org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:364) at org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} Note I took two jstacks 2 minutes apart and get the exact same result. {code:title=Calling join() in OzoneManagerDoubleBuffer#stop()} /** * Stop OM DoubleBuffer flush thread. */ public synchronized void stop() { if (isRunning) { LOG.info("Stopping OMDoubleBuffer flush thread"); isRunning = false; daemon.interrupt(); try { daemon.join(); } catch (InterruptedException e) { e.printStackTrace(); } System.out.println("!!! RETURNED"); // stop metrics. ozoneManagerDoubleBufferMetrics.unRegister(); } else { LOG.info("OMDoubleBuffer flush thread is not running."); } } {code} I ran unit test TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer locally. > OzoneManagerDoubleBuffer#stop should wait for daemon thread to die > -- > > Key: HDDS-1830 > URL: https://issues.apache.org/jira/browse/HDDS-1830 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Siyao Meng >Priority: Major > > Based on [~arp]'s comment on HDDS-1649, OzoneManagerDoubleBuffer#stop() calls > interrupt() on daemon thread but not join(). The thread might still be > running when the call returns. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org