[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159798#comment-13159798 ] Hudson commented on HBASE-4797: --- Integrated in HBase-0.92 #163 (See [https://builds.apache.org/job/HBase-0.92/163/]) HBASE-4869 Backport to 0.92: HBASE-4797 [availability] Skip recovered.edits files with edits older than what region currently has (Jimmy Xiang) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158997#comment-13158997 ] Hudson commented on HBASE-4797: --- Integrated in HBase-0.92-security #22 (See [https://builds.apache.org/job/HBase-0.92-security/22/]) HBASE-4869 Backport to 0.92: HBASE-4797 [availability] Skip recovered.edits files with edits older than what region currently has (Jimmy Xiang) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155788#comment-13155788 ] Hudson commented on HBASE-4797: --- Integrated in HBase-TRUNK-security #6 (See [https://builds.apache.org/job/HBase-TRUNK-security/6/]) HBASE-4797 [availability] Skip recovered.edits files with edits we know older than what region currently has (Jimmy Jiang) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155869#comment-13155869 ] Hudson commented on HBASE-4797: --- Integrated in HBase-TRUNK #2474 (See [https://builds.apache.org/job/HBase-TRUNK/2474/]) HBASE-4797 [availability] Skip recovered.edits files with edits we know older than what region currently has (Jimmy Jiang) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155292#comment-13155292 ] stack commented on HBASE-4797: -- @Jimmy Just FYI, since you are new, to trigger the build again, you need to re-upload the original patch or a new one (which you did), then (I think) you need to cancel and resubmit the patch. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155298#comment-13155298 ] Jimmy Xiang commented on HBASE-4797: Thanks! I cancel and resubmit the patch. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155369#comment-13155369 ] Hadoop QA commented on HBASE-4797: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504777/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/336//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/336//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/336//console This message is automatically generated. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155391#comment-13155391 ] Hadoop QA commented on HBASE-4797: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504779/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/337//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/337//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/337//console This message is automatically generated. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155686#comment-13155686 ] Jimmy Xiang commented on HBASE-4797: Can someone check in the patch? [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155690#comment-13155690 ] Ted Yu commented on HBASE-4797: --- I will. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154704#comment-13154704 ] stack commented on HBASE-4797: -- bq. The region opening is tried periodically. The waiting interval is about 1/3 of the assignment time out. I think that's fine. From the log snippet above though Jimmy, it seems like we are updating the znode every second almost. Thats too much? [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154723#comment-13154723 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3413 --- src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2906/#comment7642 maxSedId should be named maxSeqId - Ted On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154747#comment-13154747 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- bq. On 2011-11-21 23:23:07, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2468 bq. https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2468 bq. bq. maxSedId should be named maxSeqId Good catch. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3413 --- On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154771#comment-13154771 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- bq. On 2011-11-21 22:47:55, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2456 bq. https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2456 bq. bq. So, are these already sorted in right order from oldest edit to newest? All these files are under the same folder, if these files have the same name pattern as defined in HLog: String.format(%019d, seqid); yes, they are sorted in the right order based on the sequence id number. If this is not true, then the order to reapply these edits is already wrong. bq. On 2011-11-21 22:47:55, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2475 bq. https://reviews.apache.org/r/2906/diff/2/?file=59652#file59652line2475 bq. bq. Possilbe should be Possible. bq. bq. I'd be more assertive in this message. Maximum possible sequenceid for this log is + + , skipping .. Sure, I will fix it. bq. On 2011-11-21 22:47:55, Michael Stack wrote: bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java, line 2855 bq. https://reviews.apache.org/r/2906/diff/2/?file=59653#file59653line2855 bq. bq. Any more asserts we can do in here? Assert we replayed N of the M files? Sure, I added more test cases. - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3409 --- On 2011-11-21 22:38:39, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-21 22:38:39) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154773#comment-13154773 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/ --- (Updated 2011-11-22 00:32:48.813017) Review request for hbase, Todd Lipcon and Michael Stack. Changes --- Revised patch with changes per review. Summary --- If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. This addresses bug HBASE-4797. https://issues.apache.org/jira/browse/HBASE-4797 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b Diff: https://reviews.apache.org/r/2906/diff Testing --- Added test case to TestHRegion, and all the tests in this test are passed. Thanks, Jimmy [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154774#comment-13154774 ] Jimmy Xiang commented on HBASE-4797: Cool! Thanks. On Mon, Nov 21, 2011 at 2:51 PM, Kannan Muthukkaruppan (Commented) (JIRA) [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154783#comment-13154783 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3416 --- Ship it! Looks good to me. Thanks for fixing the whitespace too (although it made the patch harder to read). You also left some whitespace in testSkipRecoveredEditsReplay. :) - Lars On 2011-11-22 00:32:48, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-22 00:32:48) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154785#comment-13154785 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/ --- (Updated 2011-11-22 01:02:17.373022) Review request for hbase, Todd Lipcon and Michael Stack. Changes --- Removed white spaces in TestHRegion.java Summary --- If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. This addresses bug HBASE-4797. https://issues.apache.org/jira/browse/HBASE-4797 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b Diff: https://reviews.apache.org/r/2906/diff Testing --- Added test case to TestHRegion, and all the tests in this test are passed. Thanks, Jimmy [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154804#comment-13154804 ] Lars Hofhansl commented on HBASE-4797: -- I'm happy to commit this (pending the test run). Any objections? [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154826#comment-13154826 ] Hadoop QA commented on HBASE-4797: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12504687/0001-HBASE-4797-%5Bavailability%5D-skip-older-edits.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/326//console This message is automatically generated. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154861#comment-13154861 ] stack commented on HBASE-4797: -- No objection from me. Jimmy, want to attach patch with --no-prefix so hadoopqa runs? [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154877#comment-13154877 ] jirapos...@reviews.apache.org commented on HBASE-4797: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2906/#review3425 --- +1 on patch - ramkrishna On 2011-11-22 01:02:17, Jimmy Xiang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2906/ bq. --- bq. bq. (Updated 2011-11-22 01:02:17) bq. bq. bq. Review request for hbase, Todd Lipcon and Michael Stack. bq. bq. bq. Summary bq. --- bq. bq. If there are multiple recovered edits files, I used the file name to find the initial sequence id. After these files are sorted, we can find a file's possible maximum sequence id based on the next file's initial sequence id. If the maximum sequence id is smaller than the current sequence id, the whole recovered edits file is old and ignored. bq. bq. bq. This addresses bug HBASE-4797. bq. https://issues.apache.org/jira/browse/HBASE-4797 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 8b89661 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 5daa02b bq. bq. Diff: https://reviews.apache.org/r/2906/diff bq. bq. bq. Testing bq. --- bq. bq. Added test case to TestHRegion, and all the tests in this test are passed. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. [availability] Skip recovered.edits files with edits we know older than what region currently has - Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.94.0 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 0001-HBASE-4797-[availability]-skip-older-edits.patch Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira