[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart

2018-05-11 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472903#comment-16472903
 ] 

Xiao Chen commented on HDFS-13339:
--

Thanks for the ping, will find cycles to review next week.

Could you please update the title / description of the jira as Daryn suggested?

> Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
> ---
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Critical
>  Labels: DataNode, volumes
> Attachments: HDFS-13339.001.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart

2018-05-11 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472223#comment-16472223
 ] 

Daryn Sharp commented on HDFS-13339:


Creating new thread pool/factory instances is going to cause thread leaks, at 
least until the threads timeout.  Which is likely to cause pre-mature promotion 
of the objects and increase GC pressure later.  Use a shared instance.

This also looks like a legitimate non-test related bug?  If yes, the 
description is deceiving and should be revised to remove the reference to a 
test.

> Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
> ---
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Critical
>  Labels: DataNode, volumes
> Attachments: HDFS-13339.001.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't release when testVolFailureStatsPreservedOnNNRestart

2018-05-11 Thread Zsolt Venczel (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471838#comment-16471838
 ] 

Zsolt Venczel commented on HDFS-13339:
--

Hi [~liaoyuxiangqin],

While investigating an intermittent failure with 
TestBlockStatsMXBean.testStorageTypeStatsWhenStorageFailed I run into the same 
problem you did and I found that your solution would fix the flakiness.

I applied your patch, run tests and found it to be fit for commit therefore I 
give a +1 (non-binding).

[~xiaochen] if you could also take a look that would be great!

Thanks and best regards,
Zsolt
 

> Volume reference can't release when testVolFailureStatsPreservedOnNNRestart
> ---
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Critical
>  Labels: DataNode, volumes
> Attachments: HDFS-13339.001.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org