[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945301#comment-16945301 ] Surendra Singh Lilhore edited comment on HDFS-14754 at 10/6/19 11:00 AM: - Tested it with fix and without fix *With fix* {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.686 s - in org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [INFO] [INFO] Results: [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 {noformat} *Wihout fix* {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.605 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [ERROR] testProcessOverReplicatedAndRedudantBlock(org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks) Time elapsed: 6.509 s <<< FAILURE! java.lang.AssertionError: expected:<5> but was:<4> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:135) {noformat} was (Author: surendrasingh): Test it with fix and without fix *With fix* {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.686 s - in org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [INFO] [INFO] Results: [INFO] [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 {noformat} *Wihout fix* {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.605 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks [ERROR] testProcessOverReplicatedAndRedudantBlock(org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks) Time elapsed: 6.509 s <<< FAILURE! java.lang.AssertionError: expected:<5> but was:<4> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:135) {noformat} > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14754-addendum.001.patch, > HDFS-14754-addendum.002.patch, HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch, HDFS-14754.007.patch, HDFS-14754.008.patch, > HDFS-14754.branch-3.1.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941376#comment-16941376 ] Wei-Chiu Chuang edited comment on HDFS-14754 at 9/30/19 10:20 PM: -- Too bad this one didn't land in 3.2.1 and 3.1.3. Let's make sure it gets added to lower releases. was (Author: jojochuang): Too bad this one didn't land in 3.2.1 and 3.1.3. Let's make sure they get added to lower releases. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-14754-addendum.001.patch, HDFS-14754.001.patch, > HDFS-14754.002.patch, HDFS-14754.003.patch, HDFS-14754.004.patch, > HDFS-14754.005.patch, HDFS-14754.006.patch, HDFS-14754.007.patch, > HDFS-14754.008.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928202#comment-16928202 ] Surendra Singh Lilhore edited comment on HDFS-14754 at 9/12/19 5:06 AM: [~hemanthboyina], pls attach the patch again with proper comment in test case. {code:java} // update blocksMap cluster.triggerBlockReports(); // add to invalidates cluster.triggerHeartbeats(); // datanode delete block cluster.triggerHeartbeats(); // update blocksMap cluster.triggerBlockReports(); {code} was (Author: surendrasingh): [~hemanthboyina], pls attach the patch again with proper comment in test case. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch, HDFS-14754.007.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925157#comment-16925157 ] Ayush Saxena edited comment on HDFS-14754 at 9/8/19 12:17 PM: -- That seems a little overkill to me. No need to proof that in multiple BR also it doesn’t get corrected. IMHO just verifying that your new introduced logic now works should be enough. Anyway, there is a single test, Don't think we need a separate test class for it? May be better if you can use existing test classes. Since the change is in BlockManager, can try {{TestBlockManager}} or maybe some other. was (Author: ayushtkn): That seems a little overkill to me. No need to proof that in multiple BR also it doesn’t get corrected. IMHO just verifying that your new introduced logic now works should be enough. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925090#comment-16925090 ] Surendra Singh Lilhore edited comment on HDFS-14754 at 9/8/19 7:01 AM: --- [~ayushtkn] {quote}HDFS-14699 seems to be handling a quite similar scenario. Whether post that gets in, still this problem be there. Whether reconstruction will still not happen, without this? {quote} Both the issue are different, as I mentioned in comment , HDFS-14699 is handling the busy DN scenario not duplicate block scenario. While reviewing I checked this already. was (Author: surendrasingh): [~ayushtkn] {quote}HDFS-14699 seems to be handling a quite similar scenario. Whether post that gets in, still this problem be there. Whether reconstruction will still not happen, without this? {quote} Both the issue are different, as I mentioned in comment , HDFS-14699 is handling the busy DN scenario not duplicate block scenario. While reviewing I checked this already. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced
[ https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924916#comment-16924916 ] Ayush Saxena edited comment on HDFS-14754 at 9/7/19 4:02 PM: - Thanx [~hemanthboyina] for the patch and [~surendrasingh] for the review.. bq. Replicated block was not deleting and missing block is not able to ReConstruct. HDFS-14699 seems to be handling a quite similar scenario. Whether post that gets in, still this problem be there. Whether reconstruction will still not happen, without this? [~hemanthboyina] can you give a check once. Anyway on a quick look, For the UT, Can you add some comments regarding what actually the process you are trying to validate, for better undesirability. {code:java} +cluster.triggerBlockReports(); +cluster.triggerHeartbeats(); +cluster.triggerHeartbeats(); +cluster.triggerBlockReports(); {code} Can you add a line explaining why it is required twice, here. was (Author: ayushtkn): Thanx [~hemanthboyina] for the patch and [~surendrasingh] for the review.. bq. Replicated block was not deleting and missing block is not able to ReConstruct. HDFS-14699 seems to be handling a quite similar scenario. Whether post that gets in, still this problem be there. Whether reconstruction will still not happen, without this? [~hemanthboyina] can you give a check once. Anyway on a quick look, For the UT, Can you add some comments regarding what actually the process you are trying to validate, for better undesirability. {code:java} +int i = 0; +// one missing block +for (; i < groupSize - 1; i++) {code} Any reasons for making i out? {code:java} +cluster.triggerBlockReports(); +cluster.triggerHeartbeats(); +cluster.triggerHeartbeats(); +cluster.triggerBlockReports(); {code} Can you add a line explaining why it is required twice, here. > Erasure Coding : The number of Under-Replicated Blocks never reduced > - > > Key: HDFS-14754 > URL: https://issues.apache.org/jira/browse/HDFS-14754 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Critical > Attachments: HDFS-14754.001.patch, HDFS-14754.002.patch, > HDFS-14754.003.patch, HDFS-14754.004.patch, HDFS-14754.005.patch, > HDFS-14754.006.patch > > > Using EC RS-3-2, 6 DN > We came accross a scenario where in the EC 5 blocks , same block is > replicated thrice and two blocks got missing > Replicated block was not deleting and missing block is not able to ReConstruct -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org