[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5712: -- Issue Type: Improvement (was: Sub-task) Parent: (was: HBASE-5628) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5712-90.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5712: -- Resolution: Fixed Fix Version/s: 0.96.0 0.94.0 0.92.2 0.90.7 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.90/0.92/0.94/0.96-trunk Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5712: -- Attachment: hbase-5712-90-v2.patch hbase-5712-v2.patch v2 is the versions I committed. Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, hbase-5712-v2.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264691#comment-13264691 ] jirapos...@reviews.apache.org commented on HBASE-5712: -- bq. On 2012-04-27 23:27:20, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 204 bq. https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line204 bq. bq. This'll work but why not ConcurrentSkipListMap? Sure, changed. bq. On 2012-04-27 23:27:20, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418 bq. https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line418 bq. bq. +1 on suggested change done. bq. On 2012-04-27 23:27:20, Michael Stack wrote: bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2835 bq. https://reviews.apache.org/r/4883/diff/1/?file=104442#file104442line2835 bq. bq. Is this flag needed? Why not just check thread is alive? I see we can return with an error. What happens if the return on 2816 happens? Will the wait at #643 above be for ever? This is not a thread but actually fed to an executor (thread pool) at line 637. If the return happens on 2816, this is in a finally which will always mark the workitem as done. There are two other instances of this pattern that were originally in this code before I got to it -- I'd have used Futures (and have filed a follow on issue for it) but it works. - jmhsieh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4883/#review7337 --- On 2012-04-26 01:42:01, jmhsieh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4883/ bq. --- bq. bq. (Updated 2012-04-26 01:42:01) bq. bq. bq. Review request for hbase, Ted Yu and Jimmy Xiang. bq. bq. bq. Summary bq. --- bq. bq. * Parallelized load of .regioninfo files bq. * changed TreeMap to SortedMap in method signatures bq. * renamed a test's name. bq. bq. bq. This addresses bug HBASE-5712. bq. https://issues.apache.org/jira/browse/HBASE-5712 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 66156c2 bq.src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 6b64f10 bq. bq. Diff: https://reviews.apache.org/r/4883/diff bq. bq. bq. Testing bq. --- bq. bq. Ran patch 10x on trunk, passes. Ran 1x on 0.92 and 0.94. bq. bq. Ther 0.90 version that is nearly identical except for ignoring changes near lines HBaseFsck lines 671-680. bq. bq. bq. Thanks, bq. bq. jmhsieh bq. bq. Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, hbase-5712-v2.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
Jieshan Bean created HBASE-5900: --- Summary: HRegion#FIXED_OVERHEAD is miscalculated in 94. Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
[ https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264693#comment-13264693 ] Jieshan Bean commented on HBASE-5900: - Patch will be uploaded after full tests today. HRegion#FIXED_OVERHEAD is miscalculated in 94. -- Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests
Todd Lipcon created HBASE-5901: -- Summary: Use union type protobufs instead of class/byte pairs for multi requests Key: HBASE-5901 URL: https://issues.apache.org/jira/browse/HBASE-5901 Project: HBase Issue Type: Improvement Components: ipc, performance Affects Versions: 0.96.0 Reporter: Todd Lipcon Assignee: Todd Lipcon The current implementation of multi actions uses repeated NameBytesPairs for the contents of multi actions. Instead, we should introduce a union type protobuf for the valid actions. This makes the RPCs smaller since they don't need to carry class names, and makes deserialization faster since it can avoid some copying and reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests
[ https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-5901: --- Attachment: hbase-5901.txt This patch dropped cumulative CPU usage by about 10% for a million-record insert. Use union type protobufs instead of class/byte pairs for multi requests --- Key: HBASE-5901 URL: https://issues.apache.org/jira/browse/HBASE-5901 Project: HBase Issue Type: Improvement Components: ipc, performance Affects Versions: 0.96.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hbase-5901.txt The current implementation of multi actions uses repeated NameBytesPairs for the contents of multi actions. Instead, we should introduce a union type protobuf for the valid actions. This makes the RPCs smaller since they don't need to carry class names, and makes deserialization faster since it can avoid some copying and reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests
[ https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-5901: --- Status: Patch Available (was: Open) Use union type protobufs instead of class/byte pairs for multi requests --- Key: HBASE-5901 URL: https://issues.apache.org/jira/browse/HBASE-5901 Project: HBase Issue Type: Improvement Components: ipc, performance Affects Versions: 0.96.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hbase-5901.txt The current implementation of multi actions uses repeated NameBytesPairs for the contents of multi actions. Instead, we should introduce a union type protobuf for the valid actions. This makes the RPCs smaller since they don't need to carry class names, and makes deserialization faster since it can avoid some copying and reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests
[ https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264710#comment-13264710 ] Hadoop QA commented on HBASE-5901: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525044/hbase-5901.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1687//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1687//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1687//console This message is automatically generated. Use union type protobufs instead of class/byte pairs for multi requests --- Key: HBASE-5901 URL: https://issues.apache.org/jira/browse/HBASE-5901 Project: HBase Issue Type: Improvement Components: ipc, performance Affects Versions: 0.96.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hbase-5901.txt The current implementation of multi actions uses repeated NameBytesPairs for the contents of multi actions. Instead, we should introduce a union type protobuf for the valid actions. This makes the RPCs smaller since they don't need to carry class names, and makes deserialization faster since it can avoid some copying and reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264714#comment-13264714 ] Hudson commented on HBASE-5712: --- Integrated in HBase-TRUNK #2825 (See [https://builds.apache.org/job/HBase-TRUNK/2825/]) HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck (Revision 1332072) Result = SUCCESS jmhsieh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, hbase-5712-v2.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264717#comment-13264717 ] Hudson commented on HBASE-5712: --- Integrated in HBase-0.94 #160 (See [https://builds.apache.org/job/HBase-0.94/160/]) HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck (Revision 1332071) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, hbase-5712-v2.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage
[ https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264735#comment-13264735 ] Anoop Sam John commented on HBASE-5897: --- As per the simple patch also, there can be more CP calls happening for one Put {code} - for (int i = 0; i batchOp.operations.length; i++) { + for (int i = batchOp.nextIndexToProcess; i batchOp.operations.length; i++) { {code} Suppose in 2 calls to doMiniBatchPut() a put(List) with 100 puts operation is getting completed. For the 1st run it will call hook for all 100 Puts. Then in the next run previously it was calling again 100 times. Now it will be for all the remaining puts which were not handled in the 1st iteration. In Todd's patch this will not happen.[Calling all pre hook just before the 1st call to the doMiniBatchPut()] But that will call the pre hook much before the actual put operation. Is this correct? How some one can for sure get a pre hook call before the actual put() for a Put? prePut coprocessor hook causing substantial CPU usage - Key: HBASE-5897 URL: https://issues.apache.org/jira/browse/HBASE-5897 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5897-simple.txt, hbase-5897.txt I was running an insert workload against trunk under oprofile and saw that a significant portion of CPU usage was going to calling the prePut coprocessor hook inside doMiniBatchPut, even though I don't have any coprocessors installed. I ran a million-row insert and collected CPU time spent in the RS after commenting out the preput hook, and found CPU usage reduced by 33%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264740#comment-13264740 ] Jieshan Bean commented on HBASE-5611: - TestHeapSize failure on a 32-bit machine in 94 is caused by HBASE-5900. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5712) Parallelize load of .regioninfo files in diagnostic/repair portion of hbck.
[ https://issues.apache.org/jira/browse/HBASE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264746#comment-13264746 ] Hudson commented on HBASE-5712: --- Integrated in HBase-0.92 #393 (See [https://builds.apache.org/job/HBase-0.92/393/]) HBASE-5712 Parallelize load of .regioninfo files in diagnostic/repair portion of hbck (Revision 1332070) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java Parallelize load of .regioninfo files in diagnostic/repair portion of hbck. --- Key: HBASE-5712 URL: https://issues.apache.org/jira/browse/HBASE-5712 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: hbase-5712-90-v2.patch, hbase-5712-90.patch, hbase-5712-v2.patch, hbase-5712.patch On heavily loaded hdfs's some dfs nodes may not respond quickly and backs off for 60s before attempting to read data from another datanode. Portions of the information gathered from hdfs (.regioninfo files) are loaded serially. With HBase with clusters with 100's, or 1000's, or 1's regions encountering these 60s delay blocks progress and can be very painful. There is already some parallelization of portions of the hdfs information load operations and the goal here is move the reading of .regioninfos into the parallelized sections.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
[ https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5900: Attachment: HRegion-FIEED_OVERHEAD.patch HRegion#FIXED_OVERHEAD is miscalculated in 94. -- Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 Attachments: HRegion-FIEED_OVERHEAD.patch After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264859#comment-13264859 ] Jieshan Bean commented on HBASE-5611: - The new version patch for 94 will be uploaded after HBASE-5900 get fixed. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
[ https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5900: Status: Patch Available (was: Open) HRegion#FIXED_OVERHEAD is miscalculated in 94. -- Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 Attachments: HRegion-FIEED_OVERHEAD.patch After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
[ https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264861#comment-13264861 ] Hadoop QA commented on HBASE-5900: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525053/HRegion-FIEED_OVERHEAD.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1688//console This message is automatically generated. HRegion#FIXED_OVERHEAD is miscalculated in 94. -- Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 Attachments: HRegion-FIEED_OVERHEAD.patch After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5902) Some scripts are not executable
nkeywal created HBASE-5902: -- Summary: Some scripts are not executable Key: HBASE-5902 URL: https://issues.apache.org/jira/browse/HBASE-5902 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial -rw-rw-r-- graceful_stop.sh -rw-rw-r-- hbase-config.sh -rw-rw-r-- local-master-backup.sh -rw-rw-r-- local-regionservers.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5902) Some scripts are not executable
[ https://issues.apache.org/jira/browse/HBASE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5902: --- Attachment: 5902.v1.patch Some scripts are not executable --- Key: HBASE-5902 URL: https://issues.apache.org/jira/browse/HBASE-5902 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Attachments: 5902.v1.patch -rw-rw-r-- graceful_stop.sh -rw-rw-r-- hbase-config.sh -rw-rw-r-- local-master-backup.sh -rw-rw-r-- local-regionservers.sh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264874#comment-13264874 ] Jieshan Bean commented on HBASE-5875: - Look into the method of CatalogTracker#verifyRootRegionLocation: {noformat} public boolean verifyRootRegionLocation(final long timeout) throws InterruptedException, IOException { AdminProtocol connection = null; try { connection = waitForRootServerConnection(timeout); } catch (NotAllMetaRegionsOnlineException e) { // Pass } catch (ServerNotRunningYetException e) { // Pass -- remote server is not up so can't be carrying root } catch (UnknownHostException e) { // Pass -- server name doesn't resolve so it can't be assigned anything. } return (connection == null)? false: verifyRegionLocation(connection, this.rootRegionTracker.getRootRegionLocation(), ROOT_REGION_NAME); } {noformat} I'm thinking about an approach which can handle this issue according to different exception. e.g. if we got an ServerNotRunningYetException, we can process splitLogAndExpireIfOnline. But if we got an NotServingRegionException, we should not do that. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264876#comment-13264876 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- @Jieshan As Ted also suggested if we go by the exception then we need to add unnecessary retry logic, sleep time and also need to modify the api verifyRootRegionLocation which is used in many places. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5874) The HBase do not configure the 'fs.default.name' attribute, the hbck tool and Merge tool throw IllegalArgumentException.
[ https://issues.apache.org/jira/browse/HBASE-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264882#comment-13264882 ] Jieshan Bean commented on HBASE-5874: - +1 on this patch. I think the patches for other branches are also needed. The HBase do not configure the 'fs.default.name' attribute, the hbck tool and Merge tool throw IllegalArgumentException. Key: HBASE-5874 URL: https://issues.apache.org/jira/browse/HBASE-5874 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.90.6 Reporter: fulin wang Assignee: fulin wang Attachments: HBASE-5874-0.90.patch, HBASE-5874-trunk.patch The HBase do not configure the 'fs.default.name' attribute, the hbck tool and Merge tool throw IllegalArgumentException. the hbck tool and Merge tool, we should add 'fs.default.name' attriubte to the code. hbck exception: Exception in thread main java.lang.IllegalArgumentException: Wrong FS: hdfs://160.176.0.101:9000/hbase/.META./1028785192/.regioninfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:412) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:59) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:382) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:285) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:128) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:301) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:489) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegioninfo(HBaseFsck.java:565) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:596) at org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.java:332) at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:360) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:2907) Merge exception: [2012-05-05 10:48:24,830] [ERROR] [main] [org.apache.hadoop.hbase.util.Merge 381] exiting due to error java.lang.IllegalArgumentException: Wrong FS: hdfs://160.176.0.101:9000/hbase/.META./1028785192/.regioninfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:412) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:59) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:382) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:285) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:823) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:415) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2679) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2665) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2634) at org.apache.hadoop.hbase.util.MetaUtils.openMetaRegion(MetaUtils.java:276) at org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:261) at org.apache.hadoop.hbase.util.Merge.run(Merge.java:115) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.Merge.main(Merge.java:379) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5875: -- Attachment: HBASE-5875.patch Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264900#comment-13264900 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- Patch for trunk. TestCases passed. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5875: -- Status: Patch Available (was: Open) @Chunhui Can you take a look at this? This is in relation to HBASE-4880. Pls provide your thoughts Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-5883: Attachment: HBASE-5883-94.patch Patch for 94. All tests passed. We are still testing it in real cluster. Your comments before I post the results is welcome. Thank you. Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Attachments: HBASE-5883-94.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264933#comment-13264933 ] Hadoop QA commented on HBASE-5875: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525060/HBASE-5875.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1689//console This message is automatically generated. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5900) HRegion#FIXED_OVERHEAD is miscalculated in 94.
[ https://issues.apache.org/jira/browse/HBASE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264934#comment-13264934 ] Zhihong Yu commented on HBASE-5900: --- Please keep the original indentation so that it is easy to see the changes: {code} + + public static final long HREGION_CLASS_SIZE = ClassSize.OBJECT + + ClassSize.ARRAY + 31 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + + (5 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN; - public static final long FIXED_OVERHEAD = ClassSize.align( - ClassSize.OBJECT + - ClassSize.ARRAY + - 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + - (6 * Bytes.SIZEOF_LONG) + - Bytes.SIZEOF_BOOLEAN); {code} I ran TestHeapSize with the patch and it passed. Let's keep the patch in minimal form with the fix to FIXED_OVERHEAD only. HRegion#FIXED_OVERHEAD is miscalculated in 94. -- Key: HBASE-5900 URL: https://issues.apache.org/jira/browse/HBASE-5900 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.1 Attachments: HRegion-FIEED_OVERHEAD.patch After apply the patch of HBASE-5611, and tested on a 32-bit machine. This problem was triggered. Before this patch, TestHeapSize is passed by pure coincidence in 94. {noformat} public static final long FIXED_OVERHEAD = ClassSize.align( ClassSize.OBJECT + ClassSize.ARRAY + 30 * ClassSize.REFERENCE + Bytes.SIZEOF_INT + (6 * Bytes.SIZEOF_LONG) + Bytes.SIZEOF_BOOLEAN); {noformat} Actually, there are 31 REFERENCEs and 5 LONGs in HRegion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5875) Process RIT and Master restart may remove an online server considering it as a dead server
[ https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264938#comment-13264938 ] ramkrishna.s.vasudevan commented on HBASE-5875: --- Testcase failure seems unrelated to this fix. Process RIT and Master restart may remove an online server considering it as a dead server -- Key: HBASE-5875 URL: https://issues.apache.org/jira/browse/HBASE-5875 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.1 Attachments: HBASE-5875.patch If on master restart it finds the ROOT/META to be in RIT state, master tries to assign the ROOT region through ProcessRIT. Master will trigger the assignment and next will try to verify the Root Region Location. Root region location verification is done seeing if the RS has the region in its online list. If the master triggered assignment has not yet been completed in RS then the verify root region location will fail. Because it failed {code} splitLogAndExpireIfOnline(currentRootServer); {code} we do split log and also remove the server from online server list. Ideally here there is nothing to do in splitlog as no region server was restarted. So master, though the server is online, master just invalidates the region server. In a special case, if i have only one RS then my cluster will become non operative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264939#comment-13264939 ] ramkrishna.s.vasudevan commented on HBASE-5840: --- @Lars You want this in 0.94? If not i will commit in trunk alone? Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: rajeshbabu Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5840.patch, HBASE-5840_trunk.patch, HBASE-5840_v2.patch TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5806) Handle split region related failures on master restart and RS restart
[ https://issues.apache.org/jira/browse/HBASE-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264950#comment-13264950 ] Chinna Rao Lalam commented on HBASE-5806: - for #1 above, RegionServer is crashed at SplitTransaction.createDaughters(Server, RegionServerServices) in while removing from online regions() {code} if (!testing) { services.removeFromOnlineRegions(this.parent.getRegionInfo().getEncodedName()); } {code} Here where ever the regionserver is crashed the ephemeral node will be deleted and master will get the notification of nodeDeleted() where it will be cleared from RIT But the ServerShutdownHandler executed first than the nodeDeleted() event for the region node. You can see that from the below logs {noformat} 2012-04-06 14:35:08,841 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Removed test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. from list of regions to assign because in RIT; region state: SPLITTING 2012-04-06 14:35:12,981 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Ephemeral node deleted, regionserver crashed?, clearing from RIT; rs=test,,1333702991530.cdfa837563e75ac5f4dc128680cc8da8. state=SPLITTING, ts=1333703059260, server=HOST-10-18-40-25,60020,1333695183392 {noformat} In this situation the below code populated that region {code} ListRegionState regionsInTransition = this.services.getAssignmentManager(). processServerShutdown(this.serverName); {code} and it is in !rit.isClosing() !rit.isPendingClose() so the region is deleted from the hris {code} for (RegionState rit : regionsInTransition) { if (!rit.isClosing() !rit.isPendingClose()) { LOG.debug(Removed + rit.getRegion().getRegionNameAsString() + from list of regions to assign because in RIT; region state: + rit.getState()); if (hris != null) hris.remove(rit.getRegion()); } } {code} The fix in SSH addresses #1. #2 came because of HBASE-5615. However HBASE-5615 was reverted. #3 comes when master restarts after sp1itting is done and before CJ has cleared the region from META. So while rebuilding the user region we ensure that the offlined parent region is not again taken into account. #2 and #3 are together taken care in this patch such that the fix does solve both the problems. Handle split region related failures on master restart and RS restart - Key: HBASE-5806 URL: https://issues.apache.org/jira/browse/HBASE-5806 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: Chinna Rao Lalam Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5806.patch This issue is raised to solve issues that comes out of partial region split happened and the region node in the ZK which is in RS_ZK_REGION_SPLITTING and RS_ZK_REGION_SPLIT is not yet processed. This also tries to address HBASE-5615. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5869: - Attachment: 5869v8.txt Fixes for a few of the failing tests. Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 5869v7.txt, 5869v8.txt, firstcut.txt, secondcut.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5903) Detect the test classes without categories
nkeywal created HBASE-5903: -- Summary: Detect the test classes without categories Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5903: --- Attachment: 5903.v3.patch Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5903: --- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264974#comment-13264974 ] Hadoop QA commented on HBASE-5903: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525071/5903.v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1691//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1691//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1691//console This message is automatically generated. Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264975#comment-13264975 ] Hadoop QA commented on HBASE-5869: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525070/5869v8.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 47 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestRollingRestart org.apache.hadoop.hbase.regionserver.TestHRegionOnCluster org.apache.hadoop.hbase.regionserver.wal.TestLogRollingNoCluster org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing org.apache.hadoop.hbase.TestFullLogReconstruction org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.master.TestSplitLogManager org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1690//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1690//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1690//console This message is automatically generated. Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 5869v7.txt, 5869v8.txt, firstcut.txt, secondcut.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264978#comment-13264978 ] nkeywal commented on HBASE-5903: Considering the actual patch, we can just consider TestAssignmentManager as a little bit flaky ;-) Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5883) Backup master is going down due to connection refused exception
[ https://issues.apache.org/jira/browse/HBASE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264979#comment-13264979 ] Zhihong Yu commented on HBASE-5883: --- Why do we need the following code ? {code} +} else if (ioex.getMessage().toLowerCase() +.contains(connection refused)) { + ce = new ConnectException(ioex.getMessage()); {code} Backup master is going down due to connection refused exception --- Key: HBASE-5883 URL: https://issues.apache.org/jira/browse/HBASE-5883 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6, 0.92.1, 0.94.0 Reporter: Gopinathan A Assignee: Jieshan Bean Attachments: HBASE-5883-94.patch The active master node network was down for some time (This node contains Master,DN,ZK,RS). Here backup node got notification, and started to became active. Immedietly backup node got aborted with the below exception. {noformat} 2012-04-09 10:42:24,270 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 861248320 bytes in 4 log files in [hdfs://192.168.47.205:9000/hbase/.logs/HOST-192-168-47-202,60020,1333715537172-splitting] in 26374ms 2012-04-09 10:42:24,316 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-04-09 10:42:24,333 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: java.net.ConnectException: Connection refused at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:375) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy13.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1276) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1233) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1220) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:569) at org.apache.hadoop.hbase.catalog.CatalogTracker.getRootServerConnection(CatalogTracker.java:369) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:353) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:660) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:616) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:540) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:363) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:488) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) ... 20 more 2012-04-09 10:42:24,336 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-04-09 10:42:24,336 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264988#comment-13264988 ] Zhihong Yu commented on HBASE-5903: --- Minor comments: {code} +/** + * Copyright 2012 The Apache Software Foundation {code} Year is not needed. {code} +ListClass? badClasses = new java.util.ArrayListClass?(); {code} ArrayList is imported already. {code} + private boolean existCategoryAnnotation(Class? c) { {code} Should the above method be named 'hasCategoryAnnotation()' ? Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5903: - Attachment: 5903v4.txt What I applied (added class comment and removed copyright year line). Committed to trunk. Thanks for the patch Nicolas. Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch, 5903v4.txt The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264990#comment-13264990 ] stack commented on HBASE-5904: -- Then we should revert hbase-5155, would you agree David? IIRC, there was a reason for absence of znode meaning ENABLED but don't remember it off hand. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264991#comment-13264991 ] Hadoop QA commented on HBASE-5903: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525075/5903v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1692//console This message is automatically generated. Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch, 5903v4.txt The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264992#comment-13264992 ] David S. Wang commented on HBASE-5904: -- I think at least a partial revert of HBASE-5155 is warranted here. I don't know if we want to back it out entirely as it seems to solve a race condition that would be good to not have. Perhaps most of the patch can remain, but the part that handles how a table is represented as enabled in ZK can be reverted or worked around. But Ram can comment further on how best to handle this. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264994#comment-13264994 ] David S. Wang commented on HBASE-5904: -- Also, I'm not sure if it matters that 0.90.6 was already cut with this change. That means that there is already an incompatible release out there. I do not know what the precedent is here or if there is one. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13264999#comment-13264999 ] ramkrishna.s.vasudevan commented on HBASE-5904: --- @Stack Yes, David discussed this with me too. But i was also not sure as how to go about with this. Thanks David for bringing this up. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5889) Remove HRegionInterface
[ https://issues.apache.org/jira/browse/HBASE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265000#comment-13265000 ] stack commented on HBASE-5889: -- bq. I'm just not sure this is actually the best bang for the buck, and might make layering less clean. Because the HRegion APIs would all take pbs rather than the Get/Put/Delete, etc.? And doing this conversion would be a bunch of work that would be better spent doing other stuff? Serverside, going from pb into Get/Delete/Put just to get the data into and out of regions seems gratuitous and crud we should purge. Your profiling though would seem to make this a minor issue, one I would have thought prviously critical to address. Remove HRegionInterface --- Key: HBASE-5889 URL: https://issues.apache.org/jira/browse/HBASE-5889 Project: HBase Issue Type: Improvement Components: client, ipc, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: 0.96.0 As a step to move internals to PB, so as to avoid the conversion for performance reason, we should remove the HRegionInterface. Therefore region server only supports ClientProtocol and AdminProtocol. Later on, HRegion can work with PB messages directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265005#comment-13265005 ] stack commented on HBASE-5904: -- If 0.90.6 has this breakage, then the damage is done. We should mark hbase-5155 an incompatible change and put in a fat release note w/ how it changes behavior (Steal some of David's notes above). You up for doing this Ram? I'm surprised that the change in semantic where no znode no longer means enabled has not caused other issues. Good digging David. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265009#comment-13265009 ] ramkrishna.s.vasudevan commented on HBASE-5864: --- Let me update the resolved versions as 0.96 also. I was just about to prepare a patch for trunk. Thanks Lars for taking care of it. Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5864) Error while reading from hfile in 0.94
[ https://issues.apache.org/jira/browse/HBASE-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5864: -- Fix Version/s: 0.96.0 Error while reading from hfile in 0.94 -- Key: HBASE-5864 URL: https://issues.apache.org/jira/browse/HBASE-5864 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5864_1.patch, HBASE-5864_2.patch, HBASE-5864_3.patch, HBASE-5864_test.patch Got the following stacktrace during region split. {noformat} 2012-04-24 16:05:42,168 WARN org.apache.hadoop.hbase.regionserver.Store: Failed getting store size for value java.io.IOException: Requested block is out of range: 2906737606134037404, lastDataBlockOffset: 84764558 at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:278) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:285) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1638) at org.apache.hadoop.hbase.regionserver.Store.getSplitPoint(Store.java:1943) at org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:4921) at org.apache.hadoop.hbase.regionserver.HRegionServer.splitRegion(HRegionServer.java:2901) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5901) Use union type protobufs instead of class/byte pairs for multi requests
[ https://issues.apache.org/jira/browse/HBASE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5901: - Priority: Critical (was: Major) +1 Nice. Use union type protobufs instead of class/byte pairs for multi requests --- Key: HBASE-5901 URL: https://issues.apache.org/jira/browse/HBASE-5901 Project: HBase Issue Type: Improvement Components: ipc, performance Affects Versions: 0.96.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Attachments: hbase-5901.txt The current implementation of multi actions uses repeated NameBytesPairs for the contents of multi actions. Instead, we should introduce a union type protobuf for the valid actions. This makes the RPCs smaller since they don't need to carry class names, and makes deserialization faster since it can avoid some copying and reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265022#comment-13265022 ] ramkrishna.s.vasudevan commented on HBASE-5904: --- Added release note to HBASE-5155. @David/@Stack Please take a look at it. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reopened HBASE-5155: --- Will close this once the release note is reviewed. ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5155: -- Release Note: This issue is an incompatible change. If an HBase client with the changes for HBASE-5155 and a server (master) without the changes for HBASE-5155 is used, then the is_enabled (from HBase Shell) or isTableEnabled() (from HBaseAdmin) will return false though the table is already enabled as per the master. If the HBase client does not have the changes for HBASE-5155 and the server has the changes for HBASE-5155, then if we try to Enable a table then the client will hang. The reason is because, Prior to HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is deleted. After HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is not deleted, whereas the same node is updated with the status ENABLED. The client also expects the status of the znode in the zookeeper to be in the ENABLED state if the table has been enabled successfully. The above changes makes the client behaviour incompatible if the client does not have this fix whereas the server has this fix. If both the client and the server does not have this fix, then the behaviour is as expected. I have added a release note on this issue. Pls review. Sorry about the problem introduced. ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5903) Detect the test classes without categories
[ https://issues.apache.org/jira/browse/HBASE-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265023#comment-13265023 ] Hudson commented on HBASE-5903: --- Integrated in HBase-TRUNK #2826 (See [https://builds.apache.org/job/HBase-TRUNK/2826/]) HBASE-5903 Detect the test classes without categories (Revision 1332260) Result = SUCCESS stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestCheckTestClasses.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestHColumnDescriptor.java Detect the test classes without categories -- Key: HBASE-5903 URL: https://issues.apache.org/jira/browse/HBASE-5903 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5903.v3.patch, 5903v4.txt The tests are executed by category. When a test does not have a category, it's not run on prebuild nor central build. This new test checks the test classess and list the ones without category. It fails if it finds one. As it's a small test it will be executed on the developper machine and will fail immediately on the central builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265024#comment-13265024 ] David S. Wang commented on HBASE-5904: -- Should we back out HBASE-5155 entirely for now? I looked at it and just backing out the part that changes the znode behavior implies that we should also remove isTablePresent(), which seems to affect more of the patch's functionality and then it gets messy. Is there any later change that depends on HBASE-5155? is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage
[ https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265028#comment-13265028 ] Lars Hofhansl commented on HBASE-5897: -- Right. That's what I was trying to say when I attached my patch. prePut coprocessor hook causing substantial CPU usage - Key: HBASE-5897 URL: https://issues.apache.org/jira/browse/HBASE-5897 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5897-simple.txt, hbase-5897.txt I was running an insert workload against trunk under oprofile and saw that a significant portion of CPU usage was going to calling the prePut coprocessor hook inside doMiniBatchPut, even though I don't have any coprocessors installed. I ran a million-row insert and collected CPU time spent in the RS after commenting out the preput hook, and found CPU usage reduced by 33%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265032#comment-13265032 ] jirapos...@reviews.apache.org commented on HBASE-5869: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4926/#review7379 --- Looks good to me. - Jimmy On 2012-04-28 23:42:52, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4926/ bq. --- bq. bq. (Updated 2012-04-28 23:42:52) bq. bq. bq. Review request for hbase and Jimmy Xiang. bq. bq. bq. Summary bq. --- bq. bq. Convert two zk users to pb: distributed log splitting and regions in transition. bq. bq. Refactored distributed log splitting so we only serialize/deserialize in one location. bq. Less changes needed to do same for regions in transition. bq. bq. Moves serialization/deserialization out of the ZKAssign, ZKSplit and into bq. the classes themselves so can encapsulate how serialization is done into one place bq. (try to make the ZK* classes just deal in bytes -- about 90% done). bq. bq. Moved classes used by various packages up to top level to minimize imports bq. that are across package (zookeeper into protobuf and/or into regionserver and/or bq. master packages, etc). bq. bq. A src/main/java/org/apache/hadoop/hbase/DeserializationException.java bq.New generic deserialization exception. bq. A src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java bq. D src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java bq.Moved under zookeeper package. bq. A src/main/java/org/apache/hadoop/hbase/HBaseException.java bq.New base hbase exception as suggested by hbase-5796. New DeserializationException bq.inherits from this. bq. A src/main/java/org/apache/hadoop/hbase/RegionTransition.java bq.State of a region in transition. Top-level because used by a bq.few top-level packages. Encapsulates pb serialization/deserialization. bq. M src/main/java/org/apache/hadoop/hbase/ServerName.java bq.Add method to deserialize a ServeName, etc. Encapsulates pb'ing. bq. M src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java bq.Counters used by distributed log splitting. bq. A SplitLogTask bq. Class that encapsulates log splitting state. Also encapsulates pb'ing. bq. M src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java bq.Implement code for state. Added functions to go from code to state and vice bq.versa. Used serializing. bq. M src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java bq.Remove unused imports. bq. D src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java bq.Removed. Replaced by RegionTransition moved to package top-level. bq. M src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java bq. M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java bq.Use new DeserializationException. Move to using new RegionTransition bq.from RegionTransitionData class. Pass deserialized class rather than bq.byte array. Remove duplicated code. bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java bq.Use new ServerName parse method rather than ZKUtil one. bq. M src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java bq. M src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java bq.Redo to use new SplitLogTask and SplitLogCounter classes. bq. M src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java bq.expectPBMagicPrefix added bq. M src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java bq.Use new RegionTransition in place of RegionTransitionData. bq. M src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java bq.Define moved from ZKSplitLog to SplitLogManager. bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java bq.Changed method name from getZNodeData to toByteArray to match how we've bq.named it elsewhere. Use new DeserializationException bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java bq.Use new RegionTransion class bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java bq.Moved stuff that was in here up into SplitLogManager where better bq.belongs. Also moved serialization/deserialization up into the bq.class itself: SplitLogTask. Moved counters out to SplitLogCounter class. bq. M
[jira] [Commented] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265031#comment-13265031 ] jirapos...@reviews.apache.org commented on HBASE-5869: -- bq. On 2012-04-28 22:14:23, Jimmy Xiang wrote: bq. src/main/protobuf/ZooKeeper.proto, line 82 bq. https://reviews.apache.org/r/4926/diff/1/?file=105372#file105372line82 bq. bq. A task is a path, this is more like a task state, isn't it? bq. bq. Michael Stack wrote: bq. I can change this np. bq. bq. Currently I have the pb class named same as the class that wraps it. Should I change this? Add a pb prefix or something? Problem w/ that is that no other of the pb classes have the pb prefix. They are in the generated package which is probably sufficient to distingush them? My hope is to make it so the pbs do not leak outside of the class that serializes to them; e.g. this SplitLogTask class. bq. bq. Jimmy Xiang wrote: bq. I got your point. I prefer to have the pb class named the same as the wrapper class, if there is one. Should we create a separate task state wrapper class if needed? bq. bq. Michael Stack wrote: bq. I just tried changing the name of this class from SplitLogTask to SplitLogTaskState and it don't seem right since you can do a 'getState' call on this class -- the class has State AND the origin of the task. I'm going to leave the name as is. bq. bq. Ok on keeping names the same. It should be fine if we can keep the pb stuff bottled up under the pb package or internal only to the class that uses the pb (except where pb comes out on server..) bq. bq. Thanks Jimmy Ok, that's fine with me. bq. On 2012-04-28 22:14:23, Jimmy Xiang wrote: bq. src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 182 bq. https://reviews.apache.org/r/4926/diff/1/?file=105357#file105357line182 bq. bq. Should we abort? Under what scenario the parsing can fail, other than a conflict data format? bq. bq. Michael Stack wrote: bq. I thought I was just redoing what was there previous. We could abort but maybe next time through the deserialization works because its been updated by another? Or, we spew this error all over the logs and drive someone crazy? Will look at it again. bq. bq. Michael Stack wrote: bq. Yeah, I'll leave this as is after looking at it. Hopefully will be good on next go around. Ok - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4926/#review7360 --- On 2012-04-28 23:42:52, Michael Stack wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4926/ bq. --- bq. bq. (Updated 2012-04-28 23:42:52) bq. bq. bq. Review request for hbase and Jimmy Xiang. bq. bq. bq. Summary bq. --- bq. bq. Convert two zk users to pb: distributed log splitting and regions in transition. bq. bq. Refactored distributed log splitting so we only serialize/deserialize in one location. bq. Less changes needed to do same for regions in transition. bq. bq. Moves serialization/deserialization out of the ZKAssign, ZKSplit and into bq. the classes themselves so can encapsulate how serialization is done into one place bq. (try to make the ZK* classes just deal in bytes -- about 90% done). bq. bq. Moved classes used by various packages up to top level to minimize imports bq. that are across package (zookeeper into protobuf and/or into regionserver and/or bq. master packages, etc). bq. bq. A src/main/java/org/apache/hadoop/hbase/DeserializationException.java bq.New generic deserialization exception. bq. A src/main/java/org/apache/hadoop/hbase/zookeeper/EmptyWatcher.java bq. D src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java bq.Moved under zookeeper package. bq. A src/main/java/org/apache/hadoop/hbase/HBaseException.java bq.New base hbase exception as suggested by hbase-5796. New DeserializationException bq.inherits from this. bq. A src/main/java/org/apache/hadoop/hbase/RegionTransition.java bq.State of a region in transition. Top-level because used by a bq.few top-level packages. Encapsulates pb serialization/deserialization. bq. M src/main/java/org/apache/hadoop/hbase/ServerName.java bq.Add method to deserialize a ServeName, etc. Encapsulates pb'ing. bq. M src/main/java/org/apache/hadoop/hbase/SplitLogCounters.java bq.Counters used by distributed log splitting. bq. A SplitLogTask bq. Class that encapsulates log splitting state. Also encapsulates pb'ing. bq. M
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265047#comment-13265047 ] stack commented on HBASE-5904: -- I suppose it would make sense backing it out. We could roll a 0.90.7? is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface
nkeywal created HBASE-5905: -- Summary: Protobuf interface for Admin: split between the internal and the external/customer interface Key: HBASE-5905 URL: https://issues.apache.org/jira/browse/HBASE-5905 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal After a short discussion with Stack, I create a jira. -- I'am a little bit confused by the protobuf interface for closeRegion. We have two types of closeRegion today: 1) the external ones; available in client.HBaseAdmin. They take the server and the region identifier as a parameter and nothing else. 2) The internal ones, called for example by the master. They have more parameters (like versionOfClosingNode or transitionInZK). When I look at protobuf.ProtobufUtil, I see: public static void closeRegion(final AdminProtocol admin, final byte[] regionName, final boolean transitionInZK) throws IOException { CloseRegionRequest closeRegionRequest = RequestConverter.buildCloseRegionRequest(regionName, transitionInZK); try { admin.closeRegion(null, closeRegionRequest); } catch (ServiceException se) { throw getRemoteException(se); } } In other words, it seems that we merged the two interfaces into a single one. Is that the intend? I checked, the internal fields in closeRegionRequest are all optional (that's good). Still, it means that the end user could use them or at least would need to distinguish between the optional for functional reasons and the optional - do not use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265055#comment-13265055 ] Nicolas Spiegelberg commented on HBASE-5886: I'm confused about why this metric is useful. This metric is never accurate and determining data loss because querying it is async from the Put path. If you are looking for a restart point, you should have another thread call HTable.flush() and checkpoint or add an API to query for the latest timestamp in a CF's storefile. Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, HBASE-5886-v2.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265055#comment-13265055 ] Nicolas Spiegelberg edited comment on HBASE-5886 at 4/30/12 5:55 PM: - I'm confused about why this metric is useful. This metric is never accurate at determining data loss because querying it is async from the Put path. If you are looking for a restart point, you should have another thread call HTable.flush() and checkpoint or add an API to query for the latest timestamp in a CF's storefile. Edit: s/and/at was (Author: nspiegelberg): I'm confused about why this metric is useful. This metric is never accurate and determining data loss because querying it is async from the Put path. If you are looking for a restart point, you should have another thread call HTable.flush() and checkpoint or add an API to query for the latest timestamp in a CF's storefile. Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, HBASE-5886-v2.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5895) Slow query log in trunk is too verbose
[ https://issues.apache.org/jira/browse/HBASE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265060#comment-13265060 ] Nicolas Spiegelberg commented on HBASE-5895: It should at least be optional to enable verbose logging. Another thought was rate limiting the number of times a region could log a slow query over a given time (to rate limit logging disk IO/sec) Slow query log in trunk is too verbose -- Key: HBASE-5895 URL: https://issues.apache.org/jira/browse/HBASE-5895 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Todd Lipcon Priority: Critical Running a YCSB workload against trunk, the slow query log ends up logging the entire contents of mutate RPCs (in PB-encoded binary). This then makes the logging back up, which makes more slow queries, which makes the whole thing spin out of control. We should only summarize the RPC, rather than printing the whole contents. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor
[ https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265059#comment-13265059 ] Chris Waterson commented on HBASE-3691: --- What is the likelihood that this could be back-ported to the 0.90.x branch? Add compressor support for 'snappy', google's compressor Key: HBASE-3691 URL: https://issues.apache.org/jira/browse/HBASE-3691 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.92.0 Attachments: hbase-snappy-3691-trunk-002.patch, hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, hbase-snappy-3691-trunk.patch http://code.google.com/p/snappy/ is apache licensed. bq. Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. bq. Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as Zippy in some presentations and the likes.) Lets get it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage
[ https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265063#comment-13265063 ] stack commented on HBASE-5897: -- +1 on the more radical patch. prePut coprocessor hook causing substantial CPU usage - Key: HBASE-5897 URL: https://issues.apache.org/jira/browse/HBASE-5897 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5897-simple.txt, hbase-5897.txt I was running an insert workload against trunk under oprofile and saw that a significant portion of CPU usage was going to calling the prePut coprocessor hook inside doMiniBatchPut, even though I don't have any coprocessors installed. I ran a million-row insert and collected CPU time spent in the RS after commenting out the preput hook, and found CPU usage reduced by 33%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build
stack created HBASE-5906: Summary: TestChangingEncoding failing sporadically in 0.94 build Key: HBASE-5906 URL: https://issues.apache.org/jira/browse/HBASE-5906 Project: HBase Issue Type: Bug Reporter: stack Attachments: 5906.txt The test passes locally for me and Elliott but takes a long time to run. Timeout is only two minutes for the test though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build
[ https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5906: - Attachment: 5906.txt Patch I'm going to try Doubles timeout from two minutes to four. TestChangingEncoding failing sporadically in 0.94 build --- Key: HBASE-5906 URL: https://issues.apache.org/jira/browse/HBASE-5906 Project: HBase Issue Type: Bug Reporter: stack Attachments: 5906.txt The test passes locally for me and Elliott but takes a long time to run. Timeout is only two minutes for the test though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build
[ https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265066#comment-13265066 ] stack commented on HBASE-5906: -- Applied to 0.94 and trunk. Lets see if it fails subsequently. TestChangingEncoding failing sporadically in 0.94 build --- Key: HBASE-5906 URL: https://issues.apache.org/jira/browse/HBASE-5906 Project: HBase Issue Type: Bug Reporter: stack Attachments: 5906.txt The test passes locally for me and Elliott but takes a long time to run. Timeout is only two minutes for the test though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5785) Adding unit tests for protbuf utils introduced for HRegionInterface pb conversion
[ https://issues.apache.org/jira/browse/HBASE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265069#comment-13265069 ] jirapos...@reviews.apache.org commented on HBASE-5785: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4936/ --- Review request for hbase and Michael Stack. Summary --- I added some tests for that conversion methods. For those helper utilities, they are tested in other tests implicitly. We can add more later on if needed. This addresses bug HBASE-5785. https://issues.apache.org/jira/browse/HBASE-5785 Diffs - src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 994cb76 src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 9b594aa src/test/java/org/apache/hadoop/hbase/protobuf/TestProtobufUtil.java PRE-CREATION Diff: https://reviews.apache.org/r/4936/diff Testing --- The new tests are green. Thanks, Jimmy Adding unit tests for protbuf utils introduced for HRegionInterface pb conversion - Key: HBASE-5785 URL: https://issues.apache.org/jira/browse/HBASE-5785 Project: HBase Issue Type: Sub-task Components: ipc, master, migration, regionserver Affects Versions: 0.96.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Critical Labels: noob Fix For: 0.96.0 Attachments: hbase-5785.patch We need to add some unit tests for the probuf utilities to catch issues earlier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5885) Invalid HFile block magic on Local file System
[ https://issues.apache.org/jira/browse/HBASE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265068#comment-13265068 ] stack commented on HBASE-5885: -- I don't think the TestChangingEncoding is related to this change. It enables verification of checksum in local filesystem. The TestChangingEncoding doesn't even use local filesystem. I opened HBASE-5906 to look into the TestChangingEncoding fails. Invalid HFile block magic on Local file System -- Key: HBASE-5885 URL: https://issues.apache.org/jira/browse/HBASE-5885 Project: HBase Issue Type: Bug Affects Versions: 0.94.0, 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Fix For: 0.94.0, 0.96.0 Attachments: 5885-trunk-v2.txt, HBASE-5885-94-0.patch, HBASE-5885-94-1.patch, HBASE-5885-trunk-0.patch, HBASE-5885-trunk-1.patch ERROR: java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: Thu Apr 26 11:19:18 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader reader=file:/tmp/hbase-eclark/hbase/TestTable/e2d1c846363c75262cbfd85ea278b342/info/bae2681d63734066957b58fe791a0268, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=01/info:data/1335463981520/Put, lastKey=0002588100/info:data/1335463902296/Put, avgKeyLen=30, avgValueLen=1000, entries=1215085, length=1264354417, cur=000248/info:data/1335463994457/Put/vlen=1000/ts=0] at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:135) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:95) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:368) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3323) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3279) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3296) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2393) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1376) Caused by: java.io.IOException: Invalid HFile block magic: \xEC\xD5\x9D\xB4\xC2bfo at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:153) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:164) at org.apache.hadoop.hbase.io.hfile.HFileBlock.init(HFileBlock.java:254) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1779) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1637) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:327) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:555) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:651) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:130) ... 12 more Thu Apr 26 11:19:19 PDT 2012, org.apache.hadoop.hbase.client.ScannerCallable@190a621a, java.io.IOException: java.io.IOException: java.lang.IllegalArgumentException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1132) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1121) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2420) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at
[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor
[ https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265070#comment-13265070 ] stack commented on HBASE-3691: -- @Chris Have you tried the patch on 0.90? Does it work for you? Add compressor support for 'snappy', google's compressor Key: HBASE-3691 URL: https://issues.apache.org/jira/browse/HBASE-3691 Project: HBase Issue Type: Task Reporter: stack Priority: Critical Fix For: 0.92.0 Attachments: hbase-snappy-3691-trunk-002.patch, hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, hbase-snappy-3691-trunk.patch http://code.google.com/p/snappy/ is apache licensed. bq. Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. bq. Snappy is widely used inside Google, in everything from BigTable and MapReduce to our internal RPC systems. (Snappy has previously been referred to as Zippy in some presentations and the likes.) Lets get it in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5886) Add new metric for possible data loss due to puts without WAL
[ https://issues.apache.org/jira/browse/HBASE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265071#comment-13265071 ] Matteo Bertozzi commented on HBASE-5886: @Nicolas the metric is not meant to be precise but just to give an hint about possible data loss. Add new metric for possible data loss due to puts without WAL -- Key: HBASE-5886 URL: https://issues.apache.org/jira/browse/HBASE-5886 Project: HBase Issue Type: New Feature Components: metrics, regionserver Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Labels: metrics Attachments: HBASE-5886-v0.patch, HBASE-5886-v1.patch, HBASE-5886-v2.patch Add a metrics to keep track of puts without WAL and possible data loss size. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface
[ https://issues.apache.org/jira/browse/HBASE-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265072#comment-13265072 ] stack commented on HBASE-5905: -- Sorry N, should have read closer (was running out the door): bq. In other words, it seems that we merged the two interfaces into a single one. Is that the intend? Yes bq. I checked, the internal fields in closeRegionRequest are all optional (that's good). Still, it means that the end user could use them or at least would need to distinguish between the optional for functional reasons and the optional - do not use. Agree. I'd say this minor issue though given pb classes do not come out through pur admin public api, just the api on servers. Protobuf interface for Admin: split between the internal and the external/customer interface Key: HBASE-5905 URL: https://issues.apache.org/jira/browse/HBASE-5905 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal After a short discussion with Stack, I create a jira. -- I'am a little bit confused by the protobuf interface for closeRegion. We have two types of closeRegion today: 1) the external ones; available in client.HBaseAdmin. They take the server and the region identifier as a parameter and nothing else. 2) The internal ones, called for example by the master. They have more parameters (like versionOfClosingNode or transitionInZK). When I look at protobuf.ProtobufUtil, I see: public static void closeRegion(final AdminProtocol admin, final byte[] regionName, final boolean transitionInZK) throws IOException { CloseRegionRequest closeRegionRequest = RequestConverter.buildCloseRegionRequest(regionName, transitionInZK); try { admin.closeRegion(null, closeRegionRequest); } catch (ServiceException se) { throw getRemoteException(se); } } In other words, it seems that we merged the two interfaces into a single one. Is that the intend? I checked, the internal fields in closeRegionRequest are all optional (that's good). Still, it means that the end user could use them or at least would need to distinguish between the optional for functional reasons and the optional - do not use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265076#comment-13265076 ] Zhihong Yu commented on HBASE-5699: --- Playing with a prototype of this feature using ycsb (half insert, half upate) on a 5-node cluster where usertable has 13 regions on each region server. Without this feature: {code} 10 sec: 99965 operations; 9996.5 current ops/sec; [UPDATE AverageLatency(us)=258.68] [INSERT AverageLatency(us)=610.28] 20 sec: 99965 operations; 0 current ops/sec; 25 sec: 0 operations; 4.3 current ops/sec; [UPDATE AverageLatency(us)=2594303.62] [INSERT AverageLatency(us)=1240495.41] [OVERALL], RunTime(ms), 25844.0 [OVERALL], Throughput(ops/sec), 3868.9831295465096 [UPDATE], Operations, 49935 [UPDATE], AverageLatency(us), 674.2635626314209 {code} with this feature: {code} 10 sec: 99952 operations; 9994.2 current ops/sec; [UPDATE AverageLatency(us)=178.7] [INSERT AverageLatency(us)=584.76] 20 sec: 0 operations; 3.8 current ops/sec; [UPDATE AverageLatency(us)=10.88] [INSERT AverageLatency(us)=679174.27] 20 sec: 0 operations; 0 current ops/sec; [OVERALL], RunTime(ms), 20867.0 [OVERALL], Throughput(ops/sec), 4791.776489193463 [UPDATE], Operations, 49992 [UPDATE], AverageLatency(us), 178.6439030244839 {code} Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5904) is_enabled from shell returns differently from pre- and post- HBASE-5155
[ https://issues.apache.org/jira/browse/HBASE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265079#comment-13265079 ] David S. Wang commented on HBASE-5904: -- I have a patch to back it out and will post it once I have test it more. The patch seems to make things compatible again but I want to make sure it doesn't break anything else. Look for it in a day or two. is_enabled from shell returns differently from pre- and post- HBASE-5155 Key: HBASE-5904 URL: https://issues.apache.org/jira/browse/HBASE-5904 Project: HBase Issue Type: Bug Components: zookeeper Affects Versions: 0.90.6 Reporter: David S. Wang If I launch an hbase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers with HBASE-5155, then is_enabled for a table always returns false even if the table is considered enabled by the servers from the logs. If I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns as expected. If I launch an HBase shell that uses HBase and ZooKeeper without HBASE-5155, against HBase servers also without HBASE-5155, then is_enabled works as you'd expect. But if I then do the same thing but with an HBase shell and ZooKeeper with HBASE-5155, then is_enabled returns false even though the table is considered enabled by the servers from the logs. Additionally, if I then try to enable the table from the HBASE-5155-containing shell, it hangs because the ZooKeeper code waits for the ZNode to be updated with ENABLED in the data field, but what actually happens is that the ZNode gets deleted since the servers are running without HBASE-5155. I think the culprit is that the indication of how a table is considered enabled inside ZooKeeper has changed with HBASE-5155. Before HBASE-5155, a table was considered enabled if the ZNode for it did not exist. After HBASE-5155, a table is considered enabled if the ZNode for it exists and has ENABLED in its data. I think the current code is incompatible when running clients and servers where one side has HBASE-5155 and the other side does not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5905) Protobuf interface for Admin: split between the internal and the external/customer interface
[ https://issues.apache.org/jira/browse/HBASE-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265102#comment-13265102 ] Jimmy Xiang commented on HBASE-5905: Is there a way to specify a parameter private/internal in pb? Otherwise, we may end up with some private protocol for internal usage. Protobuf interface for Admin: split between the internal and the external/customer interface Key: HBASE-5905 URL: https://issues.apache.org/jira/browse/HBASE-5905 Project: HBase Issue Type: Improvement Components: client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal After a short discussion with Stack, I create a jira. -- I'am a little bit confused by the protobuf interface for closeRegion. We have two types of closeRegion today: 1) the external ones; available in client.HBaseAdmin. They take the server and the region identifier as a parameter and nothing else. 2) The internal ones, called for example by the master. They have more parameters (like versionOfClosingNode or transitionInZK). When I look at protobuf.ProtobufUtil, I see: public static void closeRegion(final AdminProtocol admin, final byte[] regionName, final boolean transitionInZK) throws IOException { CloseRegionRequest closeRegionRequest = RequestConverter.buildCloseRegionRequest(regionName, transitionInZK); try { admin.closeRegion(null, closeRegionRequest); } catch (ServiceException se) { throw getRemoteException(se); } } In other words, it seems that we merged the two interfaces into a single one. Is that the intend? I checked, the internal fields in closeRegionRequest are all optional (that's good). Still, it means that the end user could use them or at least would need to distinguish between the optional for functional reasons and the optional - do not use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265112#comment-13265112 ] Elliott Clark commented on HBASE-5699: -- Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase. So maybe this should be either a configurable number or something that is derived from the number of mount points hdfs is hosted on ? Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265114#comment-13265114 ] David S. Wang commented on HBASE-5155: -- Ram, If the HBase client does not have the changes for HBASE-5155 and the server has the changes for HBASE-5155, then if we try to Enable a table then the client will hang. Actually, I noticed that the hang happens in the opposite case: when the client has the changes for HBASE-5155, and the server does not. Otherwise the release note looks OK to me. ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5611: -- Attachment: 5611-94-v2.txt Patch for 0.94 branch which fixes FIXED_OVERHEAD. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265127#comment-13265127 ] Zhihong Yu commented on HBASE-5611: --- Integrated 5611-94-v2.txt to 0.94 branch. Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265128#comment-13265128 ] Zhihong Yu commented on HBASE-5699: --- Currently I use the following knob for the maximum number of WAL's on an individual region server: {code} +int totalInstances = conf.getInt(hbase.regionserver.hlog.total, DEFAULT_MAX_HLOG_INSTANCES); {code} Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build
[ https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265135#comment-13265135 ] Hudson commented on HBASE-5906: --- Integrated in HBase-TRUNK #2827 (See [https://builds.apache.org/job/HBase-TRUNK/2827/]) HBASE-5906 TestChangingEncoding failing sporadically in 0.94 build (Revision 1332320) Result = SUCCESS stack : Files : * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java TestChangingEncoding failing sporadically in 0.94 build --- Key: HBASE-5906 URL: https://issues.apache.org/jira/browse/HBASE-5906 Project: HBase Issue Type: Bug Reporter: stack Attachments: 5906.txt The test passes locally for me and Elliott but takes a long time to run. Timeout is only two minutes for the test though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
[ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265162#comment-13265162 ] Nicolas Spiegelberg commented on HBASE-5860: Also, it looks like there is a race condition in CreateAsyncCallback.processResult. The code is roughly: {code} tot_mgr_node_create_result.incrementAndGet(); if (rc != KeeperException.Code.NODEEXISTS.intValue()) { if (retry_count 0) { tot_mgr_node_create_retry.incrementAndGet(); createNode(path, retry_count - 1); } } {code} So, we should change this to: {code} try { if (rc != KeeperException.Code.NODEEXISTS.intValue()) { if (retry_count 0) { tot_mgr_node_create_retry.incrementAndGet(); createNode(path, retry_count - 1); } } } finally { tot_mgr_node_create_result.incrementAndGet(); } {code} so we don't mark the znode as responding until we decide if it's a failure and we need to reenqueue. Maybe the repercussions of creating an extra RESCAN node aren't worth finding and fixing all these subtle race conditions? splitlogmanager should not unnecessarily resubmit tasks when zk unavailable --- Key: HBASE-5860 URL: https://issues.apache.org/jira/browse/HBASE-5860 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch (Doesn't really impact the run time or correctness of log splitting) say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes) splitlogmanager should realze that the tasks are unassigned but their znodes have not been created. 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting] 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f489, likely server has closed socket, closing socket connection and attempting reconnect 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2214) Do HBASE-1996 -- setting size to return in scan rather than count of rows -- properly
[ https://issues.apache.org/jira/browse/HBASE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265164#comment-13265164 ] jirapos...@reviews.apache.org commented on HBASE-2214: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4726/#review7383 --- Where are we checking the size of the result made so far? I don't see it in the below. I'd expect it inside in the RegionScanner. Any chance of a test? Otherwise, patch looks great. /src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java https://reviews.apache.org/r/4726/#comment16293 Is this going to be annoying? If a high-traffic server, won't this get logged once per request? Perhaps thousands a second? /src/main/java/org/apache/hadoop/hbase/client/Scan.java https://reviews.apache.org/r/4726/#comment16294 Is this needed? Is this set on Scan creation? When would it change after Scan construction? Or, are we using builder pattern here and its set after construction but before use? /src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java https://reviews.apache.org/r/4726/#comment16295 oh, I see how its used now. ignore above comment. - Michael On 2012-04-26 08:18:40, ferdy wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4726/ bq. --- bq. bq. (Updated 2012-04-26 08:18:40) bq. bq. bq. Review request for hbase and Ted Yu. bq. bq. bq. Summary bq. --- bq. bq. HBASE-2214 per scan max buffersize. bq. bq. bq. This addresses bug HBASE-2214. bq. https://issues.apache.org/jira/browse/HBASE-2214 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/client/Scan.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/AdminProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java 1330680 bq. /src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionScanner.java 1330680 bq./src/main/java/org/apache/hadoop/hbase/regionserver/RegionServer.java 1330680 bq./src/main/protobuf/Client.proto 1330680 bq. /src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 1330680 bq. bq. Diff: https://reviews.apache.org/r/4726/diff bq. bq. bq. Testing bq. --- bq. bq. It works when running this test: bq. bq. bq. new HBaseTestingUtility(conf).startMiniCluster(); bq. bq. HBaseAdmin admin = new HBaseAdmin(conf); bq. if (!admin.tableExists(test)) { bq.HTableDescriptor tableDesc = new HTableDescriptor(test); bq.tableDesc.addFamily(new HColumnDescriptor(fam)); bq.admin.createTable(tableDesc); bq. } bq. bq. bq. HTable table = new HTable(conf, test); bq. Put put; bq. bq. put = new Put(Bytes.toBytes(row1)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual1),Bytes.toBytes(val1)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row2)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual2),Bytes.toBytes(val2)); bq. table.put(put); bq. bq. put = new Put(Bytes.toBytes(row3)); bq. put.add(Bytes.toBytes(fam),Bytes.toBytes(qual3),Bytes.toBytes(val3)); bq. table.put(put); bq. bq. table.flushCommits(); bq. { bq.System.out.println(returns all rows at once because of the caching); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. { bq.System.out.println(returns one row at a time because of the maxResultSize); bq.Scan scan = new Scan(); bq.scan.setCaching(100); bq.scan.setMaxResultSize(1); bq.ResultScanner scanner = table.getScanner(scan); bq.scanner.next(100); bq. } bq. bq. bq. See
[jira] [Commented] (HBASE-5699) Run with 1 WAL in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265167#comment-13265167 ] Jean-Daniel Cryans commented on HBASE-5699: --- bq. Intuitively it seems like the number of WAL's that are used should be related to the number of spindles available to hbase. I disagree, considering that most of the deployments have rep=3 you're using three spindles not one. The multiplying effect could generate a lot of disk seeks since the WALs are competing like that (plus flushing, compacting, etc). Run with 1 WAL in HRegionServer - Key: HBASE-5699 URL: https://issues.apache.org/jira/browse/HBASE-5699 Project: HBase Issue Type: Improvement Reporter: binlijin Assignee: Li Pi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-5611. -- Resolution: Fixed Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table
[ https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5385: --- Attachment: HBASE-5385-v1.patch Perform a Scan with QualifierFilter to remove a column from the _acl_ table. Delete table/column should delete stored permissions on -acl- table - Key: HBASE-5385 URL: https://issues.apache.org/jira/browse/HBASE-5385 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Matteo Bertozzi Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch Deleting the table or a column does not cascade to the stored permissions at the -acl- table. We should also remove those permissions, otherwise, it can be a security leak, where freshly created tables contain permissions from previous same-named tables. We might also want to ensure, upon table creation, that no entries are already stored at the -acl- table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table
[ https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5385: --- Status: Patch Available (was: Open) Delete table/column should delete stored permissions on -acl- table - Key: HBASE-5385 URL: https://issues.apache.org/jira/browse/HBASE-5385 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Matteo Bertozzi Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch Deleting the table or a column does not cascade to the stored permissions at the -acl- table. We should also remove those permissions, otherwise, it can be a security leak, where freshly created tables contain permissions from previous same-named tables. We might also want to ensure, upon table creation, that no entries are already stored at the -acl- table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5897) prePut coprocessor hook causing substantial CPU usage
[ https://issues.apache.org/jira/browse/HBASE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265197#comment-13265197 ] Lars Hofhansl commented on HBASE-5897: -- Looked over Todd's patch. The only difference is that before the prePut's edits ended up in WALEdit before the family edits. Now that is reversed. Not sure if that even makes a difference. +1 otherwise prePut coprocessor hook causing substantial CPU usage - Key: HBASE-5897 URL: https://issues.apache.org/jira/browse/HBASE-5897 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5897-simple.txt, hbase-5897.txt I was running an insert workload against trunk under oprofile and saw that a significant portion of CPU usage was going to calling the prePut coprocessor hook inside doMiniBatchPut, even though I don't have any coprocessors installed. I ran a million-row insert and collected CPU time spent in the RS after commenting out the preput hook, and found CPU usage reduced by 33%. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5385) Delete table/column should delete stored permissions on -acl- table
[ https://issues.apache.org/jira/browse/HBASE-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265212#comment-13265212 ] Enis Soztutar commented on HBASE-5385: -- Looks good. Can we add: 1. Audit logging AccessController.AUDITLOG 2. On preCreateTable and preAddColumn, ensure that the acl table is empty for the table / column. We might still have residual acl entries if smt goes wrong. If so, we should refuse creating a table by throwing a kind of access control exception. Andrew, any comments? Delete table/column should delete stored permissions on -acl- table - Key: HBASE-5385 URL: https://issues.apache.org/jira/browse/HBASE-5385 Project: HBase Issue Type: Sub-task Components: security Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Matteo Bertozzi Attachments: HBASE-5385-v0.patch, HBASE-5385-v1.patch Deleting the table or a column does not cascade to the stored permissions at the -acl- table. We should also remove those permissions, otherwise, it can be a security leak, where freshly created tables contain permissions from previous same-named tables. We might also want to ensure, upon table creation, that no entries are already stored at the -acl- table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5869: - Status: Patch Available (was: Open) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5869: - Attachment: 5869v9.txt I was returning early in AssignmentManager if null data inside isCarryingRegion when I should have carried on to trip over the get of region location from the AM memory. Seems to fix some of the failing tests. Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5869) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb
[ https://issues.apache.org/jira/browse/HBASE-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5869: - Status: Open (was: Patch Available) Move SplitLogManager splitlog taskstate and AssignmentManager RegionTransitionData znode datas to pb - Key: HBASE-5869 URL: https://issues.apache.org/jira/browse/HBASE-5869 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Attachments: 5869v7.txt, 5869v8.txt, 5869v9.txt, firstcut.txt, secondcut.txt, v4.txt, v5.txt, v6.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5906) TestChangingEncoding failing sporadically in 0.94 build
[ https://issues.apache.org/jira/browse/HBASE-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265384#comment-13265384 ] Hudson commented on HBASE-5906: --- Integrated in HBase-0.94 #161 (See [https://builds.apache.org/job/HBase-0.94/161/]) HBASE-5906 TestChangingEncoding failing sporadically in 0.94 build (Revision 1332319) Result = FAILURE stack : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java TestChangingEncoding failing sporadically in 0.94 build --- Key: HBASE-5906 URL: https://issues.apache.org/jira/browse/HBASE-5906 Project: HBase Issue Type: Bug Reporter: stack Attachments: 5906.txt The test passes locally for me and Elliott but takes a long time to run. Timeout is only two minutes for the test though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size
[ https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13265383#comment-13265383 ] Hudson commented on HBASE-5611: --- Integrated in HBase-0.94 #161 (See [https://builds.apache.org/job/HBase-0.94/161/]) HBASE-5611 Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size - v2 (Jieshan) (Revision 1332344) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerAccounting.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size Key: HBASE-5611 URL: https://issues.apache.org/jira/browse/HBASE-5611 Project: HBase Issue Type: Bug Affects Versions: 0.90.6 Reporter: Jean-Daniel Cryans Assignee: Jieshan Bean Priority: Critical Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5611-94-v2.txt, 5611-94.addendum, HBASE-5611-92.patch, HBASE-5611-94-minorchange.patch, HBASE-5611-trunk-v2-minorchange.patch This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think it's still possible to hit it if a region fails to open for more obscure reasons like HDFS errors. Consider a region that just went through distributed splitting and that's now being opened by a new RS. The first thing it does is to read the recovery files and put the edits in the {{MemStores}}. If this process takes a long time, the master will move that region away. At that point the edits are still accounted for in the global {{MemStore}} size but they are dropped when the {{HRegion}} gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs to force flush a region and that none of them have edits: {noformat} 2012-03-21 00:33:39,303 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=5.9g 2012-03-21 00:33:39,303 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for entry null java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223) at java.lang.Thread.run(Thread.java:662) {noformat} The {{null}} here is a region. In my case I had so many edits in the {{MemStore}} during recovery that I'm over the low barrier although in fact I'm at 0. It happened yesterday and it still printing this out. To fix this we need to be able to decrease the global {{MemStore}} size when the region can't open. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5890) SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS
[ https://issues.apache.org/jira/browse/HBASE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5890: - Fix Version/s: (was: 0.94.0) 0.94.1 Moving out for now. SplitLog Rescan BusyWaits upon Zk.CONNECTIONLOSS Key: HBASE-5890 URL: https://issues.apache.org/jira/browse/HBASE-5890 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Priority: Minor Fix For: 0.96.0, 0.89-fb, 0.94.1 Attachments: HBASE-5890.patch We ran into a production issue yesterday where the SplitLogManager tried to create a Rescan node in ZK. The createAsync() generated a KeeperException.CONNECTIONLOSS that was immedately sent to processResult(), createRescan node with --retry_count was called, and this created a CPU busywait that also clogged up the logs. We should handle this better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5888) Clover profile in build
[ https://issues.apache.org/jira/browse/HBASE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-5888: - Attachment: HBASE-5358_v2.patch Updated the patch to ignore generated packages (thrift.generated, protobuf.generated), since they are skewing coverage results. I uploaded a sample report for 0.92 here: http://people.apache.org/~enis/hbase-clover/ Clover profile in build --- Key: HBASE-5888 URL: https://issues.apache.org/jira/browse/HBASE-5888 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.92.2, 0.96.0, 0.94.1 Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-5358_v2.patch, hbase-clover_v1.patch Clover is disabled right now. I would like to add a profile that enables clover reports. We can also backport this to 0.92, and 0.94, since we are also interested in test coverage for those branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira