[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524860#comment-14524860 ] Hadoop QA commented on HDFS-5464: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10634/console | This message was automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524903#comment-14524903 ] Hadoop QA commented on HDFS-5464: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10653/console | This message was automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062703#comment-14062703 ] Colin Patrick McCabe commented on HDFS-5464: {code} // collect blocks that have not been reported -// all of them are next to the delimiter -IteratorBlockInfo it = storageInfo.new BlockIterator(delimiter.getNext(0)); -while(it.hasNext()) - toRemove.add(it.next()); -storageInfo.removeBlock(delimiter); +for(BlockInfo b : storageInfo) { + final long n = b.getNumBytes(); + if (n 0) { +// reset the length of visited block +b.setNumBytes(-n - 1); + } else { +toRemove.add(b); + } +} {code} Previously, we only ended up looping over the blocks that were not reported. Now, with this change, we'll loop over all blocks in the DataNodeDescriptor. Do you agree? This seems like it will be much slower. Imagine a datanode with 500,000 blocks, none of which have been removed since the previous block report. Previously, this loop would do nothing. Now, with this change, we'll be looping over the full 500,000 blocks again. {code} - \@Test - public void testAddStorage() throws Exception { ... - \@Test - public void testReplaceStorageIfDifferetnOneAlreadyExistedFromSameDataNode() throws Exception { {code} I understand removing {{testBlockListMoveToHead}}, but why remove these other tests? Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062720#comment-14062720 ] Tsz Wo Nicholas Sze commented on HDFS-5464: --- I think looping 500,000 is nothing in a modern computer. No? You are right that I should not remove other tests. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062798#comment-14062798 ] Andrew Wang commented on HDFS-5464: --- Considering we'll have 8 or 10TB disks in a few years, we could be seeing a lot more than just 500k blocks per DN. Storage dense nodes with 24+ disks are also out there. Memory accesses and conditional branches are also expensive. If we were just adding 500k integers together, it's not a big deal, but this loop is doing more than that. I'm not opposed in principle to this change since it is simpler and the same time complexity, but I'd like to see some microbenchmark results before committing it. Maybe rig something up with JMH? Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062809#comment-14062809 ] Tsz Wo Nicholas Sze commented on HDFS-5464: --- [~andrew.wang], could you help doing the microbenchmark you suggested? Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062867#comment-14062867 ] Colin Patrick McCabe commented on HDFS-5464: To be honest, I don't think this change makes the code any clearer. Setting block lengths to negative, and then flipping them back later, is a hack that needs to be explained by comments. It's not obvious to me that this hack is any better than the hack of adding a sentinel element to the list. In fact, I would argue the sentinel element is somewhat clearer, since you're not overloading the meaning of what block length means to mean block length and also whether I checked this recently. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062889#comment-14062889 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655865/h5464_20140715.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062942#comment-14062942 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch, h5464_20140715b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815385#comment-13815385 ] Konstantin Shvachko commented on HDFS-5464: --- you won't argue that the new code is simpler than the existing Agreed on simpler. :-) see if I could come up a better solution Sure would be interesting to see. I doubt much can be done in this respect. We need to find blocks that did not appear in the report: in one pass and with constant memory overhead. May be an interview question. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814517#comment-13814517 ] Konstantin Shvachko commented on HDFS-5464: --- So in regular case you will be adding 100,000 replicas to {{toRemove}} list only in order to delete most of them later. How does it make things simpler? The delimiter lets you keep the calculated lists as small as possible, reducing memory consumption, avoiding frequent GCs. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814527#comment-13814527 ] Tsz Wo (Nicholas), SZE commented on HDFS-5464: -- Hi Konstantin, You may be correct that the new code use more memory, however, I beg you won't argue that the new code is simpler than the existing code. :) I will think about how to reduce the memory usage. Thanks for the input. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814583#comment-13814583 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612253/h5464_20131105.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814620#comment-13814620 ] Hadoop QA commented on HDFS-5464: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612309/h5464_20131105b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestLeaseRecovery2 The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.namenode.TestFsck {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//console This message is automatically generated. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13814670#comment-13814670 ] Tsz Wo (Nicholas), SZE commented on HDFS-5464: -- h5464_20131105[bc].patch won't work. h5464_20131105.patch works but it uses more memory and longer running time. I agree with Konstantin that the original code is better although it is complicated. I will leave this for awhile and see if I could come up a better solution. Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.1#6144)