[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-08-17 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131507#comment-16131507
 ] 

SammiChen commented on HDFS-9822:
-

Hi [~andrew.wang],  so far I haven't reproduced the issue yet. I will try to 
see if it can meet the beta1 timeline. 

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-08-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130881#comment-16130881
 ] 

Hadoop QA commented on HDFS-9822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-9822 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791223/HDFS-9822-002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/20742/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-08-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130860#comment-16130860
 ] 

Andrew Wang commented on HDFS-9822:
---

Hey Sammi, are you still planning to work on this for beta1?

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-03-31 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950495#comment-15950495
 ] 

Rakesh R commented on HDFS-9822:


[~Sammi], sure please feel free to take this. I will help in 
reviews/discussions. Probably, you could dig more into the code. Also, please 
refer the test case in the attached patch, it may give some hint.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-03-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950385#comment-15950385
 ] 

Hadoop QA commented on HDFS-9822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-9822 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791223/HDFS-9822-002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18924/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-03-31 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950376#comment-15950376
 ] 

SammiChen commented on HDFS-9822:
-

Hi [~rakeshr], do you have plan to continue working on it?  If you don't have 
time, maybe I can take it. 

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-01-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817023#comment-15817023
 ] 

Hadoop QA commented on HDFS-9822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HDFS-9822 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791223/HDFS-9822-002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18146/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2017-01-10 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816843#comment-15816843
 ] 

Andrew Wang commented on HDFS-9822:
---

This one looks like a good bug, adding to "nice-to-have" and it might get 
upgrade to "must-do". [~rakeshr] do you still have plans to work on it?

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-10 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189704#comment-15189704
 ] 

Rakesh R commented on HDFS-9822:


Thanks [~zhz] for the advice. I will consider this in the next patch.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-10 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189690#comment-15189690
 ] 

Rakesh R commented on HDFS-9822:


Thanks Walter, Li Bo. If I remember correctly, I had observed once in my env 
during block corruption testing. Let me try to analyse more and get back to you.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-09 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186974#comment-15186974
 ] 

Walter Su commented on HDFS-9822:
-

bq. I am still a little confused how this error happens.
Me too. I don't think we get the right cause.
bq. But if there are same block group entry exists in different queue..
No 2 queues can have same BG. The update(..) logic is correct.
No queue can has 2 same items. The queue is a HashSet.

My pure guess is that it's caused by race condition. We have a guard at
{code}
//  BlockManager#scheduleReconstruction(..)
if (block.isStriped()) {
  if (pendingNum > 0) {
// Wait the previous reconstruction to finish.
return null;
  }
{code}
which is inside namesystem lock. But before {{ReplicationMonitor}} thread goes 
to {{validateReconstructionWork(..)}}, it loses the lock. So it's possible the 
junit thread get the lock. If they both passes the guard, eventually one of 
them will failed the assert.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-08 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186510#comment-15186510
 ] 

Li Bo commented on HDFS-9822:
-

hi,Rakesh
After reading the code of {{UnderReplicationBlocks}} I am still a little 
confused how this error happens. Since the situation is difficult to reproduce, 
how about creating a unit test case that simulates the error situation?


> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186150#comment-15186150
 ] 

Kai Zheng commented on HDFS-9822:
-

BIN ZHOU => [~zhz]. 

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186145#comment-15186145
 ] 

Kai Zheng commented on HDFS-9822:
-

Thanks [~zhzhoubin].
bq. I don't think we should schedule all EC tasks in the same queue.
Yeah I agree. It's my confusing and I can be clear. What I meant is, striped 
blocks can be tracked in separate set of queues dedicated to striping files in 
the unit of block group, instead of internal blocks in groups. So if a group 
loses 3 internal blocks, then only one entry instead of 3 are maintained in the 
queue(s).
bq. If a block group has lost 3 internal blocks, we should treat it with higher 
priority than one that has lost 1.
That's right. So when new internal block is reported bad, then the existing 
entry for the block group will merge this one, and will not create new entry, 
right. I think in this way it can basically avoid having multiple 
reconstruction tasks to be generated.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-08 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186017#comment-15186017
 ] 

Zhe Zhang commented on HDFS-9822:
-

Interesting case here. I think it happens when 1) a under replicated striped 
block is put into a queue; 2) the block group loses another internal block; 3) 
then we put another entry in a higher priority queue for the same block group. 
To answer Kai's question: I don't think we should schedule all EC tasks in the 
same queue. If a block group has lost 3 internal blocks, we should treat it 
with higher priority than one that has lost 1.

Thanks for the fix [~rakeshr]. The below logic only checks the {{oldPri}} 
queue. I guess the block could be in other queues as well?
{code}
else if (block.isStriped()
&& !priorityQueues.get(oldPri).contains(block)) {
{code}

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch, HDFS-9822-002.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178689#comment-15178689
 ] 

Hadoop QA commented on HDFS-9822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 41s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 8s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 54s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.TestEncryptionZones |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.tracing.TestTracing |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791223/HDFS-9822-002.patch |
| JIRA Issue | HDFS-9822 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux eed7f176f6ca 3.13.0-36-lowlatency #63-Ubuntu SMP 

[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-03 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178302#comment-15178302
 ] 

Rakesh R commented on HDFS-9822:


Thanks a lot [~drankye] for the interests and useful comments.

bq. 1. Why multiple reconstruction tasks for the same striped block or block 
group are figured out and put into queues?
I have come across a situation while testing corrupted striped blocks. I think 
its not a straight scenario and unfortunately this occurred only once in my 
env. Please see the below logs, here same block group 
{{9223372036854775792_1001}} is added to two different priority queues. 
Initially the block {{9223372036854775792_1001}} has added to the 
neededReplications {{priority queue 2}}. Second time, while reporting the 
addStoredBlock request the same block group {{9223372036854775792_1001}} is 
added to the neededReplications {{priority queue 1}}

{code}
2016-03-03 11:42:42,544 DEBUG BlockStateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_-9223372036854775792 added as corrupt 
on 127.0.0.1:7517 by null  because TEST
2016-03-03 11:42:42,545 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_-9223372036854775792_1001 curReplicas 8 
curExpectedReplicas 9 oldReplicas 9 oldExpectedReplicas  9 curPri  2 oldPri  3
2016-03-03 11:42:42,545 DEBUG BlockStateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update: blk_-9223372036854775792_1001 has only 
8 replicas and needs 9 replicas so is added to neededReplications at priority 
level 2
{code}

{code}
2016-03-03 11:42:42,920 WARN BlockStateChange: BLOCK* addStoredBlock: Redundant 
addStoredBlock request received for blk_-9223372036854775792_1001 on node 
127.0.0.1:7517 size 786432
2016-03-03 11:42:42,921 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_-9223372036854775792_1001 curReplicas 7 
curExpectedReplicas 9 oldReplicas 7 oldExpectedReplicas  9 curPri  1 oldPri  1
2016-03-03 11:42:42,921 DEBUG BlockStateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update: blk_-9223372036854775792_1001 has only 
7 replicas and needs 9 replicas so is added to neededReplications at priority 
level 1
{code}

bq. 2. Is it possible to maintain a separate queue for striped block groups, 
where a block group is ensured to be put into exactly once
As we know, there could be situations of both contiguous and striped under 
replicated blocks exists in the system at a time. Currently while choosing the 
under replicated blocks for reconstruction, there is a natural ordering of both 
contiguous and striped blocks. Providing a separate queue is an interesting 
idea. Just a quick thought, with a separate queue for the striped blocks, I'm 
thinking how efficiently we will be able to maintain the ordering between the 
under replicated contiguous and striped blocks.

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-03 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177980#comment-15177980
 ] 

Kai Zheng commented on HDFS-9822:
-

Thanks [~rakeshr] for working on this. I have some questions.

1. Why multiple reconstruction tasks for the same striped block or block group 
are figured out and put into queues?
2. Is it possible to maintain a separate queue for striped block groups, where 
a block group is ensured to be put into exactly once. Whenever a striped block 
in the block group is reported missed/corrupt, the block group is identified 
and checked if a task for it is already in it or not, and only add one for the 
first time. This way we can avoid this issue completely. An extra benefit is, 
if any admin would query how many striped block groups are in question, it can 
be easily figured out.

Ping [~zhz] for the discussion. 

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177752#comment-15177752
 ] 

Hadoop QA commented on HDFS-9822:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 56s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 34s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 154m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.mover.TestStorageMover |
|   | hadoop.hdfs.server.namenode.TestEditLog |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.TestFileAppend |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12791135/HDFS-9822-001.patch |
| JIRA Issue | HDFS-9822 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  

[jira] [Commented] (HDFS-9822) Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped block at the same time

2016-03-03 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177577#comment-15177577
 ] 

Rakesh R commented on HDFS-9822:


{{BlockManager#computeReconstructionWorkForBlocks()}} is getting all the blocks 
to be reconstructed, for each priority and schedules reconstruction tasks 
together. IIUC, striped blocks won't schedule multiple reconstruction tasks at 
a time. But if there are {{same block group}} entry exists in different queue 
can cause the error situation described in the jira.  I've tried an attempt to 
fix this situation. 

[~szetszwo] Could you please review the analysis and the proposed fix. Thanks!

> Erasure Coding: Avoids scheduling multiple reconstruction tasks for a striped 
> block at the same time
> 
>
> Key: HDFS-9822
> URL: https://issues.apache.org/jira/browse/HDFS-9822
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Rakesh R
> Attachments: HDFS-9822-001.patch
>
>
> Found the following AssertionError in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14501/testReport/org.apache.hadoop.hdfs.server.namenode/TestReconstructStripedBlocks/testMissingStripedBlockWithBusyNode2/
> {code}
> AssertionError: Should wait the previous reconstruction to finish
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.validateReconstructionWork(BlockManager.java:1680)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1536)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1472)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4229)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4100)
>   at java.lang.Thread.run(Thread.java:745)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:126)
>   at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:170)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4119)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)