[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-02-02 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3

Committed to trunk. Thanks [~manojg] and [~xiaochen] for the reviews for this!

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, 
> HDFS-11353.003.patch, HDFS-11353.004.patch, HDFS-11353.005.patch, 
> HDFS-11353.006.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-02-02 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, 
> HDFS-11353.003.patch, HDFS-11353.004.patch, HDFS-11353.005.patch, 
> HDFS-11353.006.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-02-01 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: HDFS-11353.006.patch

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, 
> HDFS-11353.003.patch, HDFS-11353.004.patch, HDFS-11353.005.patch, 
> HDFS-11353.006.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-27 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: HDFS-11353.005.patch

Thanks [~xiaochen] for taking a look for this and giving your comments. The 
comments seem great.
Attach a new patch to address the comments. I add the timeout {{@Rule}} in 
class {{TestDataNodeVolumeFailureToleration}} as well since I found 
{{TestDataNodeVolumeFailureToleration}} failed sometimes also. I set the 
timeout as {{120s}} as you mentioned in HDFS-11372 and this will be a 
sufficient time. I took a look in the recent Jenkins buildings, the relevant 
tests just cost around 1~2minutes.
{code}
TestDataNodeVolumeFailure   1 分 7 秒 0   -1  0   10  
+1  10  
TestDataNodeVolumeFailureReporting  1 分 35 秒0   0   
6   +6  6   +6
TestDataNodeVolumeFailureToleration 43 秒0   0   
4   4
{code}
If the test still fails, we will be easily caught  and can file the new JIRA to 
have a track.
Thanks for the review.

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, 
> HDFS-11353.003.patch, HDFS-11353.004.patch, HDFS-11353.005.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-25 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: HDFS-11353.004.patch

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch, 
> HDFS-11353.003.patch, HDFS-11353.004.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-23 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: HDFS-11353.002.patch

Reupload the v002 patch with adding the timeout time. It will be help for us to 
find the timed out test after Jenkins building.

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-23 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: (was: HDFS-11353.002.patch)

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-21 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Attachment: HDFS-11353.002.patch

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch, HDFS-11353.002.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We would be better to compare that if all the volumes 
> are failed then wair for the datanode dead.
> In addition, we can use the method {{checkDiskErrorSync}} to do the disk 
> error check instead of creaing files. In this JIRA, I would like to extract 
> this logic and defined that in {{DataNodeTestUtils}}. And then we can reuse 
> this method for datanode volme failure testing in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-21 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Description: 
Currently there are many tests which start with {{TestDataNodeVolumeFailure*}} 
frequently run timedout or failed. I found one failure test in recent Jenkins 
building. The stack info:
{code}
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
java.util.concurrent.TimeoutException: Timed out waiting for DN to die
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
{code}
The related codes:
{code}
/*
 * Now fail the 2nd volume on the 3rd datanode. All its volumes
 * are now failed and so it should report two volume failures
 * and that it's no longer up. Only wait for two replicas since
 * we'll never get a third.
 */
DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
Path file3 = new Path("/test3");
DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
DFSTestUtil.waitReplication(fs, file3, (short)2);

// The DN should consider itself dead
DFSTestUtil.waitForDatanodeDeath(dns.get(2));
{code}
Here the code waits for the datanode failed all the volume and then become 
dead. But it timed out. We would be better to compare that if all the volumes 
are failed then wair for the datanode dead.

In addition, we can use the method {{checkDiskErrorSync}} to do the disk error 
check instead of creaing files. In this JIRA, I would like to extract this 
logic and defined that in {{DataNodeTestUtils}}. And then we can reuse this 
method for datanode volme failure testing in the future.

  was:
Currently there are many tests which start with {{TestDataNodeVolumeFailure*}} 
frequently run timedout or failed. I found one failure test in recent Jenkins 
building. The stack info:
{code}
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
java.util.concurrent.TimeoutException: Timed out waiting for DN to die
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
{code}
The related codes:
{code}
/*
 * Now fail the 2nd volume on the 3rd datanode. All its volumes
 * are now failed and so it should report two volume failures
 * and that it's no longer up. Only wait for two replicas since
 * we'll never get a third.
 */
DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
Path file3 = new Path("/test3");
DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
DFSTestUtil.waitReplication(fs, file3, (short)2);

// The DN should consider itself dead
DFSTestUtil.waitForDatanodeDeath(dns.get(2));
{code}
Here the code waits for the datanode failed all the volume and then become 
dead. But it timed out. We would be better to compare that if all the volumes 
are failed then wair for the datanode dead.


> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> 

[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-21 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Description: 
Currently there are many tests which start with {{TestDataNodeVolumeFailure*}} 
frequently run timedout or failed. I found one failure test in recent Jenkins 
building. The stack info:
{code}
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
java.util.concurrent.TimeoutException: Timed out waiting for DN to die
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
{code}
The related codes:
{code}
/*
 * Now fail the 2nd volume on the 3rd datanode. All its volumes
 * are now failed and so it should report two volume failures
 * and that it's no longer up. Only wait for two replicas since
 * we'll never get a third.
 */
DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
Path file3 = new Path("/test3");
DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
DFSTestUtil.waitReplication(fs, file3, (short)2);

// The DN should consider itself dead
DFSTestUtil.waitForDatanodeDeath(dns.get(2));
{code}
Here the code waits for the datanode failed all the volume and then become 
dead. But it timed out. We would be better to compare that if all the volumes 
are failed then wair for the datanode dead.

  was:
Currently there are many tests which start with {{TestDataNodeVolumeFailure*}} 
frequently run timedout or failed. I found one failure test in recent Jenkins 
building. The stack info:
{code}
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
java.util.concurrent.TimeoutException: Timed out waiting for DN to die
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
{code}
The related codes:
{code}
/*
 * Now fail the 2nd volume on the 3rd datanode. All its volumes
 * are now failed and so it should report two volume failures
 * and that it's no longer up. Only wait for two replicas since
 * we'll never get a third.
 */
DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
Path file3 = new Path("/test3");
DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
DFSTestUtil.waitReplication(fs, file3, (short)2);

// The DN should consider itself dead
DFSTestUtil.waitForDatanodeDeath(dns.get(2));
{code}
Here the code waits for the datanode failed all the volume and then become 
dead. But it timed out. We can do an additional operation 
{{DataNodeTestUtils.checkDiskErrorSync}} to speed the error check for here. And 
this has been done in many similar places after doing 
{{DataNodeTestUtils.injectDataDirFailure}} in test 
{{TestDataNodeVolumeFailure}}.

I suppose that recent {{TestDataNodeVolumeFailure*}} failure test can also be 
improved by this. 


> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the 

[jira] [Updated] (HDFS-11353) Improve the unit tests relevant to DataNode volume failure testing

2017-01-21 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11353:
-
Summary: Improve the unit tests relevant to DataNode volume failure testing 
 (was: Speed the unit tests relevant to DataNode volume failure testing)

> Improve the unit tests relevant to DataNode volume failure testing
> --
>
> Key: HDFS-11353
> URL: https://issues.apache.org/jira/browse/HDFS-11353
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-11353.001.patch
>
>
> Currently there are many tests which start with 
> {{TestDataNodeVolumeFailure*}} frequently run timedout or failed. I found one 
> failure test in recent Jenkins building. The stack info:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures
> java.util.concurrent.TimeoutException: Timed out waiting for DN to die
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitForDatanodeDeath(DFSTestUtil.java:702)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testSuccessiveVolumeFailures(TestDataNodeVolumeFailureReporting.java:208)
> {code}
> The related codes:
> {code}
> /*
>  * Now fail the 2nd volume on the 3rd datanode. All its volumes
>  * are now failed and so it should report two volume failures
>  * and that it's no longer up. Only wait for two replicas since
>  * we'll never get a third.
>  */
> DataNodeTestUtils.injectDataDirFailure(dn3Vol2);
> Path file3 = new Path("/test3");
> DFSTestUtil.createFile(fs, file3, 1024, (short)3, 1L);
> DFSTestUtil.waitReplication(fs, file3, (short)2);
> // The DN should consider itself dead
> DFSTestUtil.waitForDatanodeDeath(dns.get(2));
> {code}
> Here the code waits for the datanode failed all the volume and then become 
> dead. But it timed out. We can do an additional operation 
> {{DataNodeTestUtils.checkDiskErrorSync}} to speed the error check for here. 
> And this has been done in many similar places after doing 
> {{DataNodeTestUtils.injectDataDirFailure}} in test 
> {{TestDataNodeVolumeFailure}}.
> I suppose that recent {{TestDataNodeVolumeFailure*}} failure test can also be 
> improved by this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org