[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472903#comment-16472903 ] Xiao Chen commented on HDFS-13339: -- Thanks for the ping, will find cycles to review next week. Could you please update the title / description of the jira as Daryn suggested? > Volume reference can't release when testVolFailureStatsPreservedOnNNRestart > --- > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: liaoyuxiangqin >Priority: Critical > Labels: DataNode, volumes > Attachments: HDFS-13339.001.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472223#comment-16472223 ] Daryn Sharp commented on HDFS-13339: Creating new thread pool/factory instances is going to cause thread leaks, at least until the threads timeout. Which is likely to cause pre-mature promotion of the objects and increase GC pressure later. Use a shared instance. This also looks like a legitimate non-test related bug? If yes, the description is deceiving and should be revised to remove the reference to a test. > Volume reference can't release when testVolFailureStatsPreservedOnNNRestart > --- > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: liaoyuxiangqin >Priority: Critical > Labels: DataNode, volumes > Attachments: HDFS-13339.001.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471838#comment-16471838 ] Zsolt Venczel commented on HDFS-13339: -- Hi [~liaoyuxiangqin], While investigating an intermittent failure with TestBlockStatsMXBean.testStorageTypeStatsWhenStorageFailed I run into the same problem you did and I found that your solution would fix the flakiness. I applied your patch, run tests and found it to be fit for commit therefore I give a +1 (non-binding). [~xiaochen] if you could also take a look that would be great! Thanks and best regards, Zsolt > Volume reference can't release when testVolFailureStatsPreservedOnNNRestart > --- > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: liaoyuxiangqin >Priority: Critical > Labels: DataNode, volumes > Attachments: HDFS-13339.001.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org