[jira] [Commented] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451774#comment-13451774 ] Hadoop QA commented on HBASE-6592: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544431/hbase-6592-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestShell Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2835//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2835//console This message is automatically generated. [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch, hbase-6592-v2.patch, hbase-6952-v1.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6592: - Attachment: (was: hbase-6592-v2.patch) [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch, hbase-6592-v2.patch, hbase-6952-v1.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Huang updated HBASE-6592: - Attachment: hbase-6592-v2.patch [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch, hbase-6592-v2.patch, hbase-6952-v1.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6592) [shell] Add means of custom formatting output by column
[ https://issues.apache.org/jira/browse/HBASE-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451794#comment-13451794 ] Hadoop QA commented on HBASE-6592: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544435/hbase-6592-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2836//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2836//console This message is automatically generated. [shell] Add means of custom formatting output by column --- Key: HBASE-6592 URL: https://issues.apache.org/jira/browse/HBASE-6592 Project: HBase Issue Type: New Feature Components: shell Reporter: stack Priority: Minor Labels: noob Attachments: hbase-6592.patch, hbase-6592-v2.patch, hbase-6952-v1.patch See Jacques suggestion toward end of this thread for how we should allow adding a custom formatter per column to use outputting column content in shell: http://search-hadoop.com/m/2WxUB1fuxL11/Printing+integers+in+the+Hbase+shellsubj=Printing+integers+in+the+Hbase+shell -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6746) Impacts of HBASE-6435 vs. HDFS 2.0 trunk
[ https://issues.apache.org/jira/browse/HBASE-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451856#comment-13451856 ] nkeywal commented on HBASE-6746: Committed revision 1382723. Impacts of HBASE-6435 vs. HDFS 2.0 trunk Key: HBASE-6746 URL: https://issues.apache.org/jira/browse/HBASE-6746 Project: HBase Issue Type: Bug Components: master, regionserver, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6746.v1.patch When using the trunk of HDFS branch 2, I had two errors linked to HBASE-6435: - a missing test to null - a method removed. This fixes it: - add the test - make the test case less dependant on HDFS internal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6742) Change default test parallelisation level to 5
[ https://issues.apache.org/jira/browse/HBASE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-6742. Resolution: Fixed Change default test parallelisation level to 5 -- Key: HBASE-6742 URL: https://issues.apache.org/jira/browse/HBASE-6742 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: hbase-6742.v1.patch Tests will be faster. Not visible if a test hangs for 15 minutes. But they should not hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyadarshini updated HBASE-6698: - Attachment: HBASE-6698_6.patch Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies
[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451876#comment-13451876 ] rajeshbabu commented on HBASE-6438: --- @Ted When I ran test suite in my local below test cases are always(without this patch also) failing because of environment problems. I ran failed tests individually in our jenkins multiple times. They are always passing. {code} Failed tests: testPermMask(org.apache.hadoop.hbase.util.TestFSUtils): expected:rwx-- but was:rwxrwxrwx Tests in error: testCacheOnWriteInSchema[1](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbase94Com/target/test-data/8a5bb561-edfc-4fab-9358-7ab726cb44fc/TestCacheOnWriteInSchema/logs testCacheOnWriteInSchema[2](org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema): Target HLog directory already exists: /mnt/F/hbase94Com/target/test-data/8a5bb561-edfc-4fab-9358-7ab726cb44fc/TestCacheOnWriteInSchema/logs testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbase94Com/target/test-data/9d7234b4-1f6a-42a7-bbb1-641eb464b7e6/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/4bbe087ebab2243b8b9633bb3d870f4c testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction): Failed delete of /mnt/F/hbase94Com/target/test-data/4afca7c8-ee29-47fb-b660-f2ee661bced7/org.apache.hadoop.hbase.regionserver.TestSplitTransaction/table/ad08ee3070175df954844582816d5927 testOffPeakCompactionRatio(org.apache.hadoop.hbase.regionserver.TestCompactSelection): Target HLog directory already exists: /mnt/F/hbase94Com/target/test-data/dd6ca8f4-4321-42d8-825b-fc6a42ab84c0/TestCompactSelection/logs Tests run: 1590, Failures: 1, Errors: 5, Skipped: 12 Running org.apache.hadoop.hbase.regionserver.TestSplitTransaction Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.383 sec Results : Tests run: 7, Failures: 0, Errors: 0, Skipped: 0 Running org.apache.hadoop.hbase.regionserver.TestCompactSelection Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.264 sec Results : Tests run: 2, Failures: 0, Errors: 0, Skipped: 0 Running org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.564 sec Results : Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 Running org.apache.hadoop.hbase.util.TestFSUtils Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.43 sec Results : Tests run: 4, Failures: 0, Errors: 0, Skipped: 0 {code} RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies -- Key: HBASE-6438 URL: https://issues.apache.org/jira/browse/HBASE-6438 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: HBASE-6438_2.patch, HBASE-6438_94.patch, HBASE-6438_trunk.patch Seeing some of the recent issues in region assignment, RegionAlreadyInTransitionException is one reason after which the region assignment may or may not happen(in the sense we need to wait for the TM to assign). In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on master restart. Consider the following case, due to some reason like master restart or external assign call, we try to assign a region that is already getting opened in a RS. Now the next call to assign has already changed the state of the znode and so the current assign that is going on the RS is affected and it fails. The second assignment that started also fails getting RAITE exception. Finally both assignments not carrying on. Idea is to find whether any such RAITE exception can be retried or not. Here again we have following cases like where - The znode is yet to transitioned from OFFLINE to OPENING in RS - RS may be in the step of openRegion. - RS may be trying to transition OPENING to OPENED. - RS is yet to add to online regions in the RS side. Here in openRegion() and updateMeta() any failures we are moving the znode to FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other cases the assignment is stopped. The idea is to just add the current state of the region assignment in the RIT map in the RS side and using that info we can determine whether the assignment can be retried or not on getting an RAITE. Considering the current work going on in AM, pls do share if this is needed atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451881#comment-13451881 ] Priyadarshini commented on HBASE-6698: -- Latest patch has renamed putsAndLocks to mutateWithLocks {code} Delete delete = new Delete(new byte[0]); {code} Now, delete tries to obtain row lock through mutation.getRow(). To get the row lock, for test case purpose this change is required.This delete api is used only in testcases. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6746) Impacts of HBASE-6435 vs. HDFS 2.0 trunk
[ https://issues.apache.org/jira/browse/HBASE-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451886#comment-13451886 ] Hudson commented on HBASE-6746: --- Integrated in HBase-TRUNK #3320 (See [https://builds.apache.org/job/HBase-TRUNK/3320/]) HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk (Revision 1382723) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java Impacts of HBASE-6435 vs. HDFS 2.0 trunk Key: HBASE-6746 URL: https://issues.apache.org/jira/browse/HBASE-6746 Project: HBase Issue Type: Bug Components: master, regionserver, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6746.v1.patch When using the trunk of HDFS branch 2, I had two errors linked to HBASE-6435: - a missing test to null - a method removed. This fixes it: - add the test - make the test case less dependant on HDFS internal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451887#comment-13451887 ] Hudson commented on HBASE-6435: --- Integrated in HBase-TRUNK #3320 (See [https://builds.apache.org/job/HBase-TRUNK/3320/]) HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk (Revision 1382723) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451909#comment-13451909 ] Hadoop QA commented on HBASE-6698: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1258/HBASE-6698_6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.master.TestAssignmentManager org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2837//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2837//console This message is automatically generated. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6746) Impacts of HBASE-6435 vs. HDFS 2.0 trunk
[ https://issues.apache.org/jira/browse/HBASE-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451912#comment-13451912 ] Hudson commented on HBASE-6746: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #168 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/168/]) HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk (Revision 1382723) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java Impacts of HBASE-6435 vs. HDFS 2.0 trunk Key: HBASE-6746 URL: https://issues.apache.org/jira/browse/HBASE-6746 Project: HBase Issue Type: Bug Components: master, regionserver, test Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6746.v1.patch When using the trunk of HDFS branch 2, I had two errors linked to HBASE-6435: - a missing test to null - a method removed. This fixes it: - add the test - make the test case less dependant on HDFS internal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5631) hbck should handle case where .tableinfo file is missing.
[ https://issues.apache.org/jira/browse/HBASE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451913#comment-13451913 ] Hudson commented on HBASE-5631: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #168 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/168/]) HBASE-5631 ADDENDUM (extra comments) (Revision 1382627) Result = FAILURE jmhsieh : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java hbck should handle case where .tableinfo file is missing. - Key: HBASE-5631 URL: https://issues.apache.org/jira/browse/HBASE-5631 Project: HBase Issue Type: Improvement Components: hbck Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jie Huang Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: hbase-5631-addendum.patch, hbase-5631.patch, hbase-5631-v1.patch, hbase-5631-v2.patch 0.92+ branches have a .tableinfo file which could be missing from hdfs. hbck should be able to detect and repair this properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes
[ https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451914#comment-13451914 ] Hudson commented on HBASE-6435: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #168 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/168/]) HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk (Revision 1382723) Result = FAILURE nkeywal : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes Key: HBASE-6435 URL: https://issues.apache.org/jira/browse/HBASE-6435 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Fix For: 0.96.0 Attachments: 6435.unfinished.patch, 6435.v10.patch, 6435.v10.patch, 6435.v12.patch, 6435.v12.patch, 6435.v12.patch, 6435-v12.txt, 6435.v13.patch, 6435.v14.patch, 6435.v2.patch, 6435.v7.patch, 6435.v8.patch, 6435.v9.patch, 6435.v9.patch, 6535.v11.patch HBase writes a Write-Ahead-Log to revover from hardware failure. This log is written on hdfs. Through ZooKeeper, HBase gets informed usually in 30s that it should start the recovery process. This means reading the Write-Ahead-Log to replay the edits on the other servers. In standards deployments, HBase process (regionserver) are deployed on the same box as the datanodes. It means that when the box stops, we've actually lost one of the edits, as we lost both the regionserver and the datanode. As HDFS marks a node as dead after ~10 minutes, it appears as available when we try to read the blocks to recover. As such, we are delaying the recovery process by 60 seconds as the read will usually fail with a socket timeout. If the file is still opened for writing, it adds an extra 20s + a risk of losing edits if we connect with ipc to the dead DN. Possible solutions are: - shorter dead datanodes detection by the NN. Requires a NN code change. - better dead datanodes management in DFSClient. Requires a DFS code change. - NN customisation to write the WAL files on another DN instead of the local one. - reordering the blocks returned by the NN on the client side to put the blocks on the same DN as the dead RS at the end of the priority queue. Requires a DFS code change or a kind of workaround. The solution retained is the last one. Compared to what was discussed on the mailing list, the proposed patch will not modify HDFS source code but adds a proxy. This for two reasons: - Some HDFS functions managing block orders are static (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require to implement partially the fix, change the DFS interface to make this function non static, or put the hook static. None of these solution is very clean. - Adding a proxy allows to put all the code in HBase, simplifying dependency management. Nevertheless, it would be better to have this in HDFS. But this solution allows to target the last version only, and this could allow minimal interface changes such as non static methods. Moreover, writing the blocks to the non local DN would be an even better solution long term. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
Jieshan Bean created HBASE-6748: --- Summary: Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback Key: HBASE-6748 URL: https://issues.apache.org/jira/browse/HBASE-6748 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.94.1, 0.96.0 Reporter: Jieshan Bean You can ealily understand the problem from the below logs: [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=3 [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=2 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=1 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=0 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775807 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775806 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775805 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775804 [2012-09-01 11:41:02,065] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775803 ... [2012-09-01 11:41:03,307] [ERROR] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.zookeeper.ClientCnxn 623] Caught unexpected throwable java.lang.StackOverflowError -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6408) Naming and documenting of the hadoop-metrics2.properties file
[ https://issues.apache.org/jira/browse/HBASE-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451941#comment-13451941 ] Michael Drzal commented on HBASE-6408: -- +1 Naming and documenting of the hadoop-metrics2.properties file - Key: HBASE-6408 URL: https://issues.apache.org/jira/browse/HBASE-6408 Project: HBase Issue Type: Sub-task Affects Versions: 0.96.0 Reporter: Elliott Clark Assignee: Elliott Clark Priority: Blocker Attachments: HBASE-6408-0.patch hadoop-metrics2.properties is currently where metrics2 loads it's sinks. This file could be better named, hadoop-hbase-metrics2.properties In addition it needs examples like the current hadoop-metrics.properties has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6419) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220)
[ https://issues.apache.org/jira/browse/HBASE-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451947#comment-13451947 ] Michael Drzal commented on HBASE-6419: -- [~ted_yu] can we close this out? PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220) --- Key: HBASE-6419 URL: https://issues.apache.org/jira/browse/HBASE-6419 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Paul Cavallaro Attachments: ServerMetrics_HBASE_6220_Flush_Metrics.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6749) Compact one expired HFile all the time
Jieshan Bean created HBASE-6749: --- Summary: Compact one expired HFile all the time Key: HBASE-6749 URL: https://issues.apache.org/jira/browse/HBASE-6749 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean It's a interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/221f56905cbd4bf09bd4d5d9dceb113a.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=118730, majorCompaction=false 2012-09-07 02:21:05,309 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/297b45a6c5f541dca05105ab098dab8d, isReference=false, isBulkLoadResult=false, seqid=122018, majorCompaction=false 2012-09-07 02:21:05,326 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/4a4a4598bc0443c9be087812052d6796.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119850, majorCompaction=false 2012-09-07 02:21:05,348 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/8c6d56c9bafb4b0eb0dd6e04e41ca5b7, isReference=false, isBulkLoadResult=false, seqid=123135, majorCompaction=false 2012-09-07 02:21:05,357 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a6c2873a646c425f87a6a8a271a9904e, isReference=false, isBulkLoadResult=false, seqid=76561, majorCompaction=false 2012-09-07 02:21:05,370 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a712a2f79bd247f48405d2c6a91757ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120951, majorCompaction=false 2012-09-07 02:21:05,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a717c63214534cb0aaf8e695147fde46, isReference=false, isBulkLoadResult=false, seqid=122763, majorCompaction=false 2012-09-07 02:21:05,431 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/c456f3831b094ac3a0590678acbf27a5.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120579, majorCompaction=false 2012-09-07 02:21:05,518 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/dde4e56b131a4ffdaec8f9574bffa5ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=121651, majorCompaction=false 2012-09-07 02:21:05,593 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/ee62844594f2474e88186dbde673c802.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119478, majorCompaction=false Compaction was triggered during Region-Opening . As we know, compaction should choose the expired HFiles to compact first, so that 1 expired HFile was choosed. Since no KeyValue was there, compaction deleted the old HFile and created a new one with minimumTimestamp = -1 maximumTimestamp = -1. So after the first compaction, there were still 10 HFiles. It triggered compaction again and again. 2012-09-07 02:21:06,079 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7 whose maxTimeStamp is -1 while the max expired timestamp is 1344824466079 2012-09-07 02:21:06,080 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7, keycount=0, bloomtype=NONE, size=558, encoding=NONE 2012-09-07 02:21:06,082 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file:hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/.tmp/8239019ff92f49bfab26b02ca43bc26a with permission:rwxrwxrwx -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6749) Compact one expired HFile all the time
[ https://issues.apache.org/jira/browse/HBASE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-6749: Description: It's an interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: {noformat} 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/221f56905cbd4bf09bd4d5d9dceb113a.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=118730, majorCompaction=false 2012-09-07 02:21:05,309 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/297b45a6c5f541dca05105ab098dab8d, isReference=false, isBulkLoadResult=false, seqid=122018, majorCompaction=false 2012-09-07 02:21:05,326 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/4a4a4598bc0443c9be087812052d6796.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119850, majorCompaction=false 2012-09-07 02:21:05,348 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/8c6d56c9bafb4b0eb0dd6e04e41ca5b7, isReference=false, isBulkLoadResult=false, seqid=123135, majorCompaction=false 2012-09-07 02:21:05,357 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a6c2873a646c425f87a6a8a271a9904e, isReference=false, isBulkLoadResult=false, seqid=76561, majorCompaction=false 2012-09-07 02:21:05,370 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a712a2f79bd247f48405d2c6a91757ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120951, majorCompaction=false 2012-09-07 02:21:05,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a717c63214534cb0aaf8e695147fde46, isReference=false, isBulkLoadResult=false, seqid=122763, majorCompaction=false 2012-09-07 02:21:05,431 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/c456f3831b094ac3a0590678acbf27a5.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120579, majorCompaction=false 2012-09-07 02:21:05,518 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/dde4e56b131a4ffdaec8f9574bffa5ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=121651, majorCompaction=false 2012-09-07 02:21:05,593 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/ee62844594f2474e88186dbde673c802.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119478, majorCompaction=false {noformat} Compaction was triggered during Region-Opening . As we know, compaction should choose the expired HFiles to compact first, so that 1 expired HFile was choosed. Since no KeyValue was there, compaction deleted the old HFile and created a new one with minimumTimestamp = -1 maximumTimestamp = -1. So after the first compaction, there were still 10 HFiles. It triggered compaction again and again. {noformat} 2012-09-07 02:21:06,079 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7 whose maxTimeStamp is -1 while the max expired timestamp is 1344824466079 2012-09-07 02:21:06,080 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7, keycount=0, bloomtype=NONE, size=558, encoding=NONE 2012-09-07 02:21:06,082 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file:hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/.tmp/8239019ff92f49bfab26b02ca43bc26a with permission:rwxrwxrwx {noformat} was: It's a interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: {noformat} 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded
[jira] [Updated] (HBASE-6749) Compact one expired HFile all the time
[ https://issues.apache.org/jira/browse/HBASE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-6749: Description: It's a interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: {noformat} 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/221f56905cbd4bf09bd4d5d9dceb113a.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=118730, majorCompaction=false 2012-09-07 02:21:05,309 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/297b45a6c5f541dca05105ab098dab8d, isReference=false, isBulkLoadResult=false, seqid=122018, majorCompaction=false 2012-09-07 02:21:05,326 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/4a4a4598bc0443c9be087812052d6796.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119850, majorCompaction=false 2012-09-07 02:21:05,348 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/8c6d56c9bafb4b0eb0dd6e04e41ca5b7, isReference=false, isBulkLoadResult=false, seqid=123135, majorCompaction=false 2012-09-07 02:21:05,357 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a6c2873a646c425f87a6a8a271a9904e, isReference=false, isBulkLoadResult=false, seqid=76561, majorCompaction=false 2012-09-07 02:21:05,370 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a712a2f79bd247f48405d2c6a91757ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120951, majorCompaction=false 2012-09-07 02:21:05,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a717c63214534cb0aaf8e695147fde46, isReference=false, isBulkLoadResult=false, seqid=122763, majorCompaction=false 2012-09-07 02:21:05,431 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/c456f3831b094ac3a0590678acbf27a5.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120579, majorCompaction=false 2012-09-07 02:21:05,518 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/dde4e56b131a4ffdaec8f9574bffa5ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=121651, majorCompaction=false 2012-09-07 02:21:05,593 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/ee62844594f2474e88186dbde673c802.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119478, majorCompaction=false {noformat} Compaction was triggered during Region-Opening . As we know, compaction should choose the expired HFiles to compact first, so that 1 expired HFile was choosed. Since no KeyValue was there, compaction deleted the old HFile and created a new one with minimumTimestamp = -1 maximumTimestamp = -1. So after the first compaction, there were still 10 HFiles. It triggered compaction again and again. {noformat} 2012-09-07 02:21:06,079 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7 whose maxTimeStamp is -1 while the max expired timestamp is 1344824466079 2012-09-07 02:21:06,080 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7, keycount=0, bloomtype=NONE, size=558, encoding=NONE 2012-09-07 02:21:06,082 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file:hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/.tmp/8239019ff92f49bfab26b02ca43bc26a with permission:rwxrwxrwx {noformat} was: It's a interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded
[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451950#comment-13451950 ] Michael Drzal commented on HBASE-6429: -- [~zhi...@ebaysf.com] can we close this out? Filter with filterRow() returning true is incompatible with scan with limit --- Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Assignee: Jie Huang Fix For: 0.96.0 Attachments: hbase-6429_0_94_0.patch, hbase-6429-trunk.patch, hbase-6429-trunk-v2.patch, hbase-6429-trunk-v3.patch, hbase-6429-trunk-v4.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6430) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide
[ https://issues.apache.org/jira/browse/HBASE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Drzal updated HBASE-6430: - Labels: noob (was: ) Few modifications in section 2.4.2.1 of Apache HBase Reference Guide Key: HBASE-6430 URL: https://issues.apache.org/jira/browse/HBASE-6430 Project: HBase Issue Type: Improvement Reporter: Mohammad Tariq Iqbal Priority: Minor Labels: noob Attachments: HBASE-6430.txt Quite often, newbies face some issues while configuring Hbase in pseudo distributed mode. I was no exception. I would like to propose some solutions for these problems which worked for me. If the community finds it appropriate, I would like to apply the patch for the same. This is the first time I am trying to do something like this, so please pardon me if I have put it in an appropriate manner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6750) Provide a variant of ValueFilter that only accepts the latest value (like SingleColumnValueFilter.setLatestVersionOnly)
David Witten created HBASE-6750: --- Summary: Provide a variant of ValueFilter that only accepts the latest value (like SingleColumnValueFilter.setLatestVersionOnly) Key: HBASE-6750 URL: https://issues.apache.org/jira/browse/HBASE-6750 Project: HBase Issue Type: New Feature Components: filters Affects Versions: 0.90.5 Environment: All Reporter: David Witten Currently ValueFilter will return an old value that matches if the latest value does not. I recommend providing an option on ValueFilter, like setLastestVersionOnly, or creating a subclass of ValueFilter that always has this behavior. Below is a custom filter that seems to work, though you may want to copy and frob ValueFilter to just return NEXT_COL where it returns SKIP: package dummy.hbasesvr; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.filter.ValueFilter; import org.apache.hadoop.hbase.filter.WritableByteArrayComparable; /** * The same as {@link ValueFilter} except it will only look at the latest value for a given column. */ public class LatestValueFilter extends ValueFilter { /** * Writable constructor, do not use. */ public LatestValueFilter() { } /** * Constructor. * @param valueCompareOp the compare op for value matching * @param valueComparator the comparator for value matching */ public LatestValueFilter(CompareOp valueCompareOp, WritableByteArrayComparable valueComparator) { super(valueCompareOp, valueComparator); } @Override public ReturnCode filterKeyValue( KeyValue v) { // This assumes that given several KeyValues with the same row+fam+qual+val the one with // the latest value will be given first. ReturnCode superReturnCode = super.filterKeyValue(v); if ( superReturnCode == ReturnCode.SKIP) { return ReturnCode.NEXT_COL; } return superReturnCode; } } Note I am a novice HBase user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451956#comment-13451956 ] Michael Drzal commented on HBASE-6431: -- [~zhi...@ebaysf.com] can we close this out? Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.92.1, 0.94.0 Reporter: Alex Newman Assignee: Alex Newman Priority: Minor Fix For: 0.96.0 Attachments: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch, 6431-v2.txt Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final ListFilter rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final ListFilter rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private ListFilter filters = new ArrayListFilter(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6441) MasterFS doesn't set scheme for internal FileSystem
[ https://issues.apache.org/jira/browse/HBASE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451964#comment-13451964 ] Michael Drzal commented on HBASE-6441: -- [~apurtell] and [~jesse_yates] any consensus on 0.94? MasterFS doesn't set scheme for internal FileSystem --- Key: HBASE-6441 URL: https://issues.apache.org/jira/browse/HBASE-6441 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0, 0.94.2 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: java_HBASE-6441_v0.patch FSUtils.getRootDir() just takes a configuration object, which is used to: 1) Get the name of the root directory 2) Create a filesystem (based on the configured scheme) 3) Qualify the root onto the filesystem However, the FileSystem from the master filesystem won't generate the correctly qualified root directory under hadoop-2.0 (though it works fine on hadoop-1.0). Seems to be an issue with the configuration parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6431: -- Resolution: Fixed Status: Resolved (was: Patch Available) Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.92.1, 0.94.0 Reporter: Alex Newman Assignee: Alex Newman Priority: Minor Fix For: 0.96.0 Attachments: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch, 6431-v2.txt Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final ListFilter rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final ListFilter rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private ListFilter filters = new ArrayListFilter(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6431: -- Hadoop Flags: Reviewed Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.92.1, 0.94.0 Reporter: Alex Newman Assignee: Alex Newman Priority: Minor Fix For: 0.96.0 Attachments: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch, 6431-v2.txt Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final ListFilter rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final ListFilter rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private ListFilter filters = new ArrayListFilter(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit
[ https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6429: -- Resolution: Fixed Status: Resolved (was: Patch Available) Filter with filterRow() returning true is incompatible with scan with limit --- Key: HBASE-6429 URL: https://issues.apache.org/jira/browse/HBASE-6429 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.96.0 Reporter: Jason Dai Assignee: Jie Huang Fix For: 0.96.0 Attachments: hbase-6429_0_94_0.patch, hbase-6429-trunk.patch, hbase-6429-trunk-v2.patch, hbase-6429-trunk-v3.patch, hbase-6429-trunk-v4.patch Currently if we scan with bot limit and a Filter with filterRow(ListKeyValue) implemented, an IncompatibleFilterException will be thrown. The same exception should also be thrown if the filer has its filterRow() implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6419) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220)
[ https://issues.apache.org/jira/browse/HBASE-6419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6419: -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) PersistentMetricsTimeVaryingRate gets used for non-time-based metrics (part2 of HBASE-6220) --- Key: HBASE-6419 URL: https://issues.apache.org/jira/browse/HBASE-6419 Project: HBase Issue Type: Improvement Reporter: stack Assignee: Paul Cavallaro Fix For: 0.96.0 Attachments: ServerMetrics_HBASE_6220_Flush_Metrics.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451996#comment-13451996 ] Ted Yu commented on HBASE-6698: --- From https://builds.apache.org/job/PreCommit-HBASE-Build/2837//testReport/org.apache.hadoop.hbase.master/TestAssignmentManager/testShutdownHandler/: {code} org.mockito.exceptions.misusing.WrongTypeOfReturnValue: CatalogTracker$$EnhancerByMockitoWithCGLIB$$5da9aeb6 cannot be returned by isStopped() isStopped() should return boolean {code} Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
[ https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6748: -- Description: You can ealily understand the problem from the below logs: {code} [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=3 [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=2 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=1 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=0 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775807 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775806 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775805 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775804 [2012-09-01 11:41:02,065] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775803 ... [2012-09-01 11:41:03,307] [ERROR] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.zookeeper.ClientCnxn 623] Caught unexpected throwable java.lang.StackOverflowError {code} was: You can ealily understand the problem from the below logs: [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=3 [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1]
[jira] [Created] (HBASE-6751) Too many retries, leading a a delay to read the HLog after a datanode failure
nkeywal created HBASE-6751: -- Summary: Too many retries, leading a a delay to read the HLog after a datanode failure Key: HBASE-6751 URL: https://issues.apache.org/jira/browse/HBASE-6751 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.94.0, 0.96.0 Reporter: nkeywal When reading an HLog, we need to got to the last block to get the file size. In HDFS 1.0.3, it leads to HDFS-3701 / HBASE-6401 In HDFS branch-2, this bug is fixed; but we have two other issues. 1) For simple cases as a single node died, we don't have the effect of HDFS-3703, and the default location order leads us to try to connect to a dead datanode while we should not. This is not analysed yet. A specific JIRA will be created later. 2) If we are redirected to a wrong node, we experience a huge delay: The pseudo code in DFSInputStream#readBlockLength is: {noformat} for(DatanodeInfo datanode : locatedblock.getLocations()) { try { ClientDatanodeProtocol cdp = DFSUtil.createClientDatanodeProtocolProxy( datanode, dfsClient.conf, dfsClient.getConf().socketTimeout, dfsClient.getConf().connectToDnViaHostname, locatedblock); return cdp.getReplicaVisibleLength(locatedblock.getBlock()); } catch { // retry } } {noformat} However, with this code, the connection is created with a null RetryPolicy. It's then defaulted to 10 retries, with: {noformat} public static final String IPC_CLIENT_CONNECT_MAX_RETRIES_KEY = ipc.client.connect.max.retries; public static final int IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT = 10; {noformat} So if the first datanode is bad, we will try 10 times before trying the second. In the context of HBASE-6738, the split task is cancelled before we're have opened the file to split. By nature, it's likely to be a pure HDFS issue. But may be it can be solved in HBase with the right configuration on ipc.client.connect.max.retries. The ideal fix (in HDFS) would be to try the datanodes once each, and then loop 10 times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6752) On region server failure, serves writes and timeranged reads during the log split.
nkeywal created HBASE-6752: -- Summary: On region server failure, serves writes and timeranged reads during the log split. Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
[ https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452053#comment-13452053 ] Ted Yu commented on HBASE-6748: --- @Jieshan: Can you tell us under what circumstance the above recursion happened ? Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback --- Key: HBASE-6748 URL: https://issues.apache.org/jira/browse/HBASE-6748 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0, 0.94.1 Reporter: Jieshan Bean Priority: Critical You can ealily understand the problem from the below logs: {code} [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=3 [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=2 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=1 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=0 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775807 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775806 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775805 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775804 [2012-09-01 11:41:02,065] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775803 ... [2012-09-01 11:41:03,307] [ERROR]
[jira] [Created] (HBASE-6753) Potential bug in Put object toString()
Walter Tietze created HBASE-6753: Summary: Potential bug in Put object toString() Key: HBASE-6753 URL: https://issues.apache.org/jira/browse/HBASE-6753 Project: HBase Issue Type: Bug Components: coprocessors Environment: Cloudera CDH 4.0.1 with hbase 0.92.1-cdh4.0.1 Reporter: Walter Tietze Priority: Minor I'm a newbie to HBase. I implemented a coprocessor which is pretty nice with Cloudera version 4.0.1. Testing my copressor evolved a problem, because everytime I inserted logging into my prePut-method, the Put object was not stored anymore into HBase. I analyzed the code and could reduce the problem to the fact, that calling the toString-method on the Put object alone, is the reason for this behaviour. There seems to be a problem with the serialization of the object. Serialization seems to modifiy the object with the result, that it is not inserted in HBase anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6752: -- Summary: On region server failure, serve writes and timeranged reads during the log split (was: On region server failure, serves writes and timeranged reads during the log split.) On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6753) Potential bug in Put object toString()
[ https://issues.apache.org/jira/browse/HBASE-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452114#comment-13452114 ] Ted Yu commented on HBASE-6753: --- In Operation.toString(), I see: {code} try { return toJSON(maxCols); } catch (IOException ioe) { return toMap(maxCols).toString(); } {code} Is jersey-json library on the classpath ? If you can provide more log, that would help us analyze the problem. Potential bug in Put object toString() -- Key: HBASE-6753 URL: https://issues.apache.org/jira/browse/HBASE-6753 Project: HBase Issue Type: Bug Components: coprocessors Environment: Cloudera CDH 4.0.1 with hbase 0.92.1-cdh4.0.1 Reporter: Walter Tietze Priority: Minor I'm a newbie to HBase. I implemented a coprocessor which is pretty nice with Cloudera version 4.0.1. Testing my copressor evolved a problem, because everytime I inserted logging into my prePut-method, the Put object was not stored anymore into HBase. I analyzed the code and could reduce the problem to the fact, that calling the toString-method on the Put object alone, is the reason for this behaviour. There seems to be a problem with the serialization of the object. Serialization seems to modifiy the object with the result, that it is not inserted in HBase anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6630) Port HBASE-6590 to trunk : Assign sequence number to bulk loaded files
[ https://issues.apache.org/jira/browse/HBASE-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-6630: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Port HBASE-6590 to trunk : Assign sequence number to bulk loaded files -- Key: HBASE-6630 URL: https://issues.apache.org/jira/browse/HBASE-6630 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.1 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.96.0 Attachments: 6590-seq-id-bulk-load.txt, 6630-v2.txt Currently bulk loaded files are not assigned a sequence number. Thus, they can only be used to import historical data, dating to the past. There are cases where we want to bulk load current data; but the bulk load mechanism does not support this, as the bulk loaded files are always sorted behind the non-bulkloaded hfiles. Assigning Sequence Id to bulk loaded files should solve this issue. StoreFiles within a store are sorted based on the sequenceId. SequenceId is a monotonically increasing number that accompanies every edit written to the WAL. For entries that update the same cell, we would like the latter edit to win. This comparision is accomplished using memstoreTS, at the KV level; and sequenceId at the StoreFile level (to order scanners in the KeyValueHeap). BulkLoaded files are generated outside of HBase/RegionServer, so they do not have a sequenceId written in the file. This causes HBase to lose track of the point in time, when the BulkLoaded file was imported to HBase. Resulting in a behavior, that *only* supports viewing bulkLoaded files as files back-filling data from the begining of time. By assigning a sequence number to the file, we can allow the bulk loaded file to fit in where we want. Either at the current time or the begining of time. The latter is the default, to maintain backward compatibility. Design approach: Store files keep track of the sequence Id in the trailer. Since we do not wish to edit/rewrite the bulk loaded file upon import, we will encode the assigned sequenceId into the fileName. The filename RegEx is updated for this regard. If the sequenceId is encoded in the filename, the sequenceId will be used as the sequenceId for the file. If none is found, the sequenceId will be considered 0 (as per the default, backward-compatible behavior). To enable clients to request pre-existing behavior, the command line utility allows for 2 ways to import BulkLoaded Files: to assign or not assign a sequence Number. If a sequence Number is assigned, the imporeted file will be imported with the current sequence Id. if the sequence Number is not assigned, it will be as if it was backfilling old data, from the begining of time. Compaction behavior: With the current compaction algorithm, bulk loaded files ā that backfill data, to the begining of time ā can cause a compaction storm, converting every minor compaction to a major compaction. To address this, these files are excluded from minor compaction, based on a config param. (enabled for the messages use case). Since, bulk loaded files that are not back-filling data do not cause this issue, they will not be ignored during minor compactions based on the config parameter. This is also required to ensure that there are no holes in the set of files selected for compaction ā this is necessary to preserve the order of KV's comparision before and after compaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6745) Always flush region based on memstore size could hold hlog files from archiving
[ https://issues.apache.org/jira/browse/HBASE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HBASE-6745: -- Assignee: Jimmy Xiang Always flush region based on memstore size could hold hlog files from archiving --- Key: HBASE-6745 URL: https://issues.apache.org/jira/browse/HBASE-6745 Project: HBase Issue Type: Bug Components: replication, wal Reporter: Jimmy Xiang Assignee: Jimmy Xiang Currently, memstore flusher always chooses the biggest memstore region to flush. Suppose I have two tables: one is very actively updated, while the other is periodically updated. The active one has biggest memstore all the time and is flushed all the time. But the in-active one never gets a chance to flush. Since it is not flushed, the hlog file can't be archived, although there are lots of hlog files. If the active table happens to have big updates all the time, the hlog files could cause huge disk space pressure. Other than the memstore size, periodically flushing regions based on hlog roll time is helpful in hlog archiving/replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6745) Always flush region based on memstore size could hold hlog files from archiving
[ https://issues.apache.org/jira/browse/HBASE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-6745. Resolution: Invalid Looked into the code, which looks fine to me. After each log roll, proper regions are scheduled to flush automatically. Biggest region is flushed only under global memory pressure. Always flush region based on memstore size could hold hlog files from archiving --- Key: HBASE-6745 URL: https://issues.apache.org/jira/browse/HBASE-6745 Project: HBase Issue Type: Bug Components: replication, wal Reporter: Jimmy Xiang Assignee: Jimmy Xiang Currently, memstore flusher always chooses the biggest memstore region to flush. Suppose I have two tables: one is very actively updated, while the other is periodically updated. The active one has biggest memstore all the time and is flushed all the time. But the in-active one never gets a chance to flush. Since it is not flushed, the hlog file can't be archived, although there are lots of hlog files. If the active table happens to have big updates all the time, the hlog files could cause huge disk space pressure. Other than the memstore size, periodically flushing regions based on hlog roll time is helpful in hlog archiving/replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452148#comment-13452148 ] ramkrishna.s.vasudevan commented on HBASE-6698: --- I tried running all the failed testcases. All of them passed. {code} 2012-09-10 22:38:21,013 INFO [main] hbase.HBaseTestingUtility(288): Created new mini-cluster data directory: D:\HBase\Trunk\hbase-server\target\test-data\1b34c2bc-9ae2-4c2c-a17e-a7a7ef73b1b6\dfscluster_1f11df5c-39e2-47e7-bf7b-634dbebf785a 2012-09-10 22:38:21,540 INFO [main] zookeeper.MiniZooKeeperCluster(196): Started MiniZK Cluster and connect 1 ZK server on client port: 50620 2012-09-10 22:38:21,549 ERROR [main] hbase.ResourceChecker(129): Bad configuration: the operating systems file handles maximum is 0 our is 1024 2012-09-10 22:38:21,971 INFO [main] hbase.ResourceChecker(144): before master.TestAssignmentManager#testShutdownHandler: 10 threads, 0 file descriptors 0 connections, 2012-09-10 22:38:22,371 DEBUG [main] zookeeper.ZKUtil(102): mockedServer opening connection to ZooKeeper with ensemble (localhost:50620) 2012-09-10 22:38:22,435 INFO [main] zookeeper.RecoverableZooKeeper(101): The identifier of this process is 4624@Ram 2012-09-10 22:38:22,569 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(261): mockedServer Received ZooKeeper Event, type=None, state=SyncConnected, path=null 2012-09-10 22:38:22,580 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(338): mockedServer-0x139b127390f connected 2012-09-10 22:38:24,517 DEBUG [main] executor.ExecutorService(132): Starting executor service name=MASTER_OPEN_REGION-testShutdownHandler, corePoolSize=3, maxPoolSize=3 2012-09-10 22:38:24,518 DEBUG [main] executor.ExecutorService(132): Starting executor service name=MASTER_CLOSE_REGION-testShutdownHandler, corePoolSize=3, maxPoolSize=3 2012-09-10 22:38:24,518 DEBUG [main] executor.ExecutorService(132): Starting executor service name=MASTER_SERVER_OPERATIONS-testShutdownHandler, corePoolSize=3, maxPoolSize=3 2012-09-10 22:38:24,518 DEBUG [main] executor.ExecutorService(132): Starting executor service name=MASTER_META_SERVER_OPERATIONS-testShutdownHandler, corePoolSize=3, maxPoolSize=3 2012-09-10 22:38:26,200 INFO [main] handler.ServerShutdownHandler(181): Skipping log splitting for example.org,1234,5678 2012-09-10 22:38:26,461 DEBUG [main] client.ClientScanner(94): Creating scanner over .META. starting at key '' 2012-09-10 22:38:26,461 DEBUG [main] client.ClientScanner(205): Advancing internal scanner to startKey at '' 2012-09-10 22:38:26,781 DEBUG [main] client.ClientScanner(192): Finished with scanning at {NAME = 't,,1347296900866.db9424ce7e14acb58b9420b098b996ea.', STARTKEY = '', ENDKEY = '', ENCODED = db9424ce7e14acb58b9420b098b996ea,} 2012-09-10 22:38:26,801 INFO [main] handler.ServerShutdownHandler(282): Reassigning 1 region(s) that example.org,1234,5678 was carrying (skipping 0 regions(s) that are already in transition) 2012-09-10 22:38:26,801 INFO [main] handler.ServerShutdownHandler(378): The table t was deleted. Hence not proceeding. 2012-09-10 22:38:26,801 INFO [main] master.AssignmentManager(1372): Quickly assigning 0 region(s) across 2 server(s) 2012-09-10 22:38:26,801 INFO [main] master.AssignmentManager(1377): Failed getting bulk plan, assigning region singly 2012-09-10 22:38:26,801 INFO [main] handler.ServerShutdownHandler(359): Finished processing of shutdown of example.org,1234,5678 2012-09-10 22:38:26,801 DEBUG [main] zookeeper.ZKAssign(538): mockedServer-0x139b127390f Deleting any existing unassigned nodes 2012-09-10 22:38:26,832 DEBUG [main] zookeeper.ZKAssign(538): mockedServer-0x139b127390f Deleting any existing unassigned nodes 2012-09-10 22:38:26,853 INFO [main] hbase.ResourceChecker(144): after master.TestAssignmentManager#testShutdownHandler: 12 threads (was 10), 0 file descriptors 1 connections, -thread leak?- 2012-09-10 22:38:27,863 INFO [main] zookeeper.MiniZooKeeperCluster(238): Shutdown MiniZK cluster with all ZK servers {code} Also could not find the error that occured while running testShutDownHandler. The code is the updated one? Am i missing something here? Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future
[jira] [Commented] (HBASE-6725) Check and Put can fail when using locks
[ https://issues.apache.org/jira/browse/HBASE-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452235#comment-13452235 ] Nicolas ThiƩbaud commented on HBASE-6725: - The bug comes from the fact that checkAndMutate uses getLock(lockId, get.getRow(), true) and resuses the provided lock. Since several client threads use the same lock many are allowed to mutate. I believe there are 2 solutions: a. Use some sort of internal lock, which requires a specific lock set for check and mutate b. Consider CAP with locks as bad usage of locks and deprecate the feature. I'm willing to contribute for this patch, but I'm not sure how to go on from now. Check and Put can fail when using locks --- Key: HBASE-6725 URL: https://issues.apache.org/jira/browse/HBASE-6725 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Environment: tested on mac os x Reporter: Nicolas ThiƩbaud Attachments: CAPwithLocks.zip, TestCase_HBASE_6725.patch When multiple threads race using CAP with and a lock on the same row, several instances may be allowed to update the cell with the new value (although the expected value is different). If all threads race with a wrong expected value and a lock, none will be able to update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6725) Check and Put can fail when using locks
[ https://issues.apache.org/jira/browse/HBASE-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452246#comment-13452246 ] Ted Yu commented on HBASE-6725: --- @Nicolas: Thanks for the finding. For point b, can you write a summary on dev@ list to poll people's opinion ? Check and Put can fail when using locks --- Key: HBASE-6725 URL: https://issues.apache.org/jira/browse/HBASE-6725 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Environment: tested on mac os x Reporter: Nicolas ThiƩbaud Attachments: CAPwithLocks.zip, TestCase_HBASE_6725.patch When multiple threads race using CAP with and a lock on the same row, several instances may be allowed to update the cell with the new value (although the expected value is different). If all threads race with a wrong expected value and a lock, none will be able to update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452248#comment-13452248 ] stack commented on HBASE-6698: -- I ran the tests individually and they passed for me. Let me retry the patch. Rather than '+Delete delete = new Delete(new byte[0]);', could pass HConstants.EMPTY_BYTE_ARRAY (I could add on commit)... thats minor. Let me do a more extensive review. This patch is great. It could be too good to be true. I just want to check Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Patch Available (was: Open) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Open (was: Patch Available) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Attachment: HBASE-6698_6.patch Retry Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6441) MasterFS doesn't set scheme for internal FileSystem
[ https://issues.apache.org/jira/browse/HBASE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452259#comment-13452259 ] Jesse Yates commented on HBASE-6441: I'm good with it going into 0.94.3 - I don't think this has been fixed yet (though there was some work Hsieh was doing around this, IIRC) MasterFS doesn't set scheme for internal FileSystem --- Key: HBASE-6441 URL: https://issues.apache.org/jira/browse/HBASE-6441 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0, 0.94.2 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.96.0 Attachments: java_HBASE-6441_v0.patch FSUtils.getRootDir() just takes a configuration object, which is used to: 1) Get the name of the root directory 2) Create a filesystem (based on the configured scheme) 3) Qualify the root onto the filesystem However, the FileSystem from the master filesystem won't generate the correctly qualified root directory under hadoop-2.0 (though it works fine on hadoop-1.0). Seems to be an issue with the configuration parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452273#comment-13452273 ] Alex Newman commented on HBASE-6431: Fine by me Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.92.1, 0.94.0 Reporter: Alex Newman Assignee: Alex Newman Priority: Minor Fix For: 0.96.0 Attachments: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch, 6431-v2.txt Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final ListFilter rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final ListFilter rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private ListFilter filters = new ArrayListFilter(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-6658: -- Attachment: HBASE-6658-v4.patch Hmm...builds for me with -Dhadoop.profile=2.0. Attaching same patch rebased against trunk. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452325#comment-13452325 ] Hadoop QA commented on HBASE-6698: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544495/HBASE-6698_6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2838//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2838//console This message is automatically generated. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Open (was: Patch Available) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Attachment: HBASE-6698_6.patch Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Patch Available (was: Open) try again... Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (HBASE-6431) Some FilterList Constructors break addFilter
[ https://issues.apache.org/jira/browse/HBASE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack closed HBASE-6431. Committed a good while back. Resolving. Some FilterList Constructors break addFilter Key: HBASE-6431 URL: https://issues.apache.org/jira/browse/HBASE-6431 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.92.1, 0.94.0 Reporter: Alex Newman Assignee: Alex Newman Priority: Minor Fix For: 0.96.0 Attachments: 0001-HBASE-6431.-Some-FilterList-Constructors-break-addFi.patch, 6431-v2.txt Some of the constructors for FilterList set the internal list of filters to list types which don't support the add operation. As a result FilterList(final ListFilter rowFilters) FilterList(final Filter... rowFilters) FilterList(final Operator operator, final ListFilter rowFilters) FilterList(final Operator operator, final Filter... rowFilters) may init private ListFilter filters = new ArrayListFilter(); incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452359#comment-13452359 ] Gregory Chanan commented on HBASE-6658: --- https://builds.apache.org/job/PreCommit-HBASE-Build/2839/ Failed again, I'll take a closer look. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-6658: -- Attachment: HBASE-6658-v5.patch Forgot to add a file, this should work. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-6658: -- Attachment: HBASE-6658-v6.patch * Attached HBASE-6658-v6.patch * Looks like previous patch had slightly old generated proto files, reattaching. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452405#comment-13452405 ] Hadoop QA commented on HBASE-6698: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544509/HBASE-6698_6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2840//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2840//console This message is automatically generated. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6747) Enable server side limit default on the size of results returned (to prevent OOMs)
[ https://issues.apache.org/jira/browse/HBASE-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kannan Muthukkaruppan updated HBASE-6747: - Summary: Enable server side limit default on the size of results returned (to prevent OOMs) (was: Enable server side limit on the size of results returned.) Enable server side limit default on the size of results returned (to prevent OOMs) -- Key: HBASE-6747 URL: https://issues.apache.org/jira/browse/HBASE-6747 Project: HBase Issue Type: Improvement Reporter: Amitanand Aiyer Priority: Minor We have seen a couple of situations where clients fetching a large row can cause the whole server to go down, due to large GC pauses/Out of memory error. This should be easily avoidable, if the client can use a Scan instead of a Get, and/or use batching to reduce the size. But, it seems difficult to enforce this. Moreover, once in a while, there may be genuine outliers/bad clients, that cause such large requests. We need to handle such situations gracefully, and not have the RS reboot for things that can be prevented. The proposal here is to enforce a maximum response size at the Server end, so we are not at the mercy of the client's good behavior to let the server running. We already log large responses. But, if the response is too large, it just kills the server. We don't have it logged, and the only way to find out is to go through the heap dump. More importantly, our availability/reliability numbers will go down because the whole region/regionserver fails instead of just the single bad request. I think it will be useful for the server to maintain a maximum request size that it will serve. Something large like 2-3G, so normal operations do not need to be bothered. If a single get/scan operation exceeds the size, we will just throw an exception for the request. This will a) avoid the RS from going on and on until it hits out of memory, and b) will give a cleaner way for the clients, and for us to see what is the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452423#comment-13452423 ] stack commented on HBASE-6698: -- The first rerun above OOME'd. The second shows no obvious hang. Retry again. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Open (was: Patch Available) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Attachment: HBASE-6698_6.patch Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-6698: - Status: Patch Available (was: Open) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452432#comment-13452432 ] stack commented on HBASE-6698: -- I took a look at the first change: {code} -prepareDelete(delete); -internalDelete(delete, delete.getClusterId(), writeToWAL); +doBatchMutate(delete, lid); {code} If I look at doBatchMutate, it is missing special handling that prepareDelete does: e.g. the piece in prepareDelete where if no column family is specified, we set for each column family in the HTableDescriptor, a special cell w/ the current timestamp. My worry is that corner cases are not covered by this mass replace. Please convince me its just my bad review not catching them. Thanks Priyadarshini. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452433#comment-13452433 ] Hadoop QA commented on HBASE-6658: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544520/HBASE-6658-v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2841//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2841//console This message is automatically generated. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452440#comment-13452440 ] stack commented on HBASE-6658: -- +1 on patch. Thanks for the cleanup. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6658) Rename WritableByteArrayComparable to something not mentioning Writable
[ https://issues.apache.org/jira/browse/HBASE-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452458#comment-13452458 ] Hadoop QA commented on HBASE-6658: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544520/HBASE-6658-v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2842//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2842//console This message is automatically generated. Rename WritableByteArrayComparable to something not mentioning Writable --- Key: HBASE-6658 URL: https://issues.apache.org/jira/browse/HBASE-6658 Project: HBase Issue Type: Bug Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-6658.patch, HBASE-6658-v3.patch, HBASE-6658-v4.patch, HBASE-6658-v5.patch, HBASE-6658-v6.patch After HBASE-6477, WritableByteArrayComparable will no longer be Writable, so should be renamed. Current idea is ByteArrayComparator (since all the derived classes are *Comparator not *Comparable), but I'm open to suggestions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452467#comment-13452467 ] stack commented on HBASE-6752: -- bq. specific exception returned to the client when we cannot server. Who would return this? Not the server that just failed? Or is it during recovery? The region will be assigned this new location and meta gets updated w/ new location only the region is not fully on line because its still recovering? Or is this when region is moved? bq. Priority is given to region server holding the new regions What does this mean? What kinda of priority? I like being able to take writes the sooner. On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6427) Pluggable compaction and scan policies via coprocessors
[ https://issues.apache.org/jira/browse/HBASE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452487#comment-13452487 ] Lars Hofhansl commented on HBASE-6427: -- Not sure. BaseRegionObserver is public, whereas KeyValueScanner and InternalScanner are not. The fact that they are in the same class file does not matter for the annotation. Right? Pluggable compaction and scan policies via coprocessors --- Key: HBASE-6427 URL: https://issues.apache.org/jira/browse/HBASE-6427 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0, 0.94.2 Attachments: 6427-0.94-addendum.txt, 6427-0.94.txt, 6427-notReady.txt, 6427-v10.txt, 6427-v1.txt, 6427-v2.txt, 6427-v3.txt, 6427-v4.txt, 6427-v5.txt, 6427-v7.txt When implementing higher level stores on top of HBase it is necessary to allow dynamic control over how long KVs must be kept around. Semi-static config options for ColumnFamilies (# of version or TTL) is not sufficient. This can be done with a few additional coprocessor hooks, or by makeing Store.ScanInfo pluggable. Was: The simplest way to achieve this is to have a pluggable class to determine the smallestReadpoint for Region. That way outside code can control what KVs to retain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452486#comment-13452486 ] nkeywal commented on HBASE-6752: bq. Who would return this? Not the server that just failed? If we reassign immediately, the client will go to the new regionserver. So the region server will be able to tell it a real status (for example, on reads, we can estimate the recovery time left and the regionserver can say: come back in 20 seconds for this region). bq. What does this mean? What kinda of priority? Today, the split is performed by any available RS. If we preassign the regions, the split can be done by the regionserver which is owning some of the data we're expecting to find in the hlog file... On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6754) Import .META. table exported from 0.94
stack created HBASE-6754: Summary: Import .META. table exported from 0.94 Key: HBASE-6754 URL: https://issues.apache.org/jira/browse/HBASE-6754 Project: HBase Issue Type: New Feature Reporter: stack This issue is a copy of HBASE-6650 except it is not a subtask of HBASE-5305, the protobuffing issue. The below description is copied from HBASE-6650. HBASE-6052 converts .META. and ROOT table content to protobuf. This JIRA allows .META. table exported from 0.94 (see HBASE-3271) to be imported into live cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6650) Import .META. table exported from 0.94
[ https://issues.apache.org/jira/browse/HBASE-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-6650. -- Resolution: Duplicate Resolving as duplicate of a new issue HBASE-6754 (I have to do this to get this issue out from under hbase-5305 as a subtask) Import .META. table exported from 0.94 -- Key: HBASE-6650 URL: https://issues.apache.org/jira/browse/HBASE-6650 Project: HBase Issue Type: Sub-task Reporter: Ted Yu HBASE-6052 converts .META. and ROOT table content to protobuf. This JIRA allows .META. table exported from 0.94 (see HBASE-3271) to be imported into live cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6713) Stopping META/ROOT RS may take 50mins when some region is splitting
[ https://issues.apache.org/jira/browse/HBASE-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6713: - Fix Version/s: (was: 0.94.3) 0.94.2 Stopping META/ROOT RS may take 50mins when some region is splitting --- Key: HBASE-6713 URL: https://issues.apache.org/jira/browse/HBASE-6713 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.1 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.2 Attachments: 6713.92-94, 6713v3.patch, HBASE-6713.patch, HBASE-6713v2.patch When we stop the RS carrying ROOT/META, if it is in the splitting for some region, the whole stopping process may take 50 mins. The reason is : 1.ROOT/META region is closed when stopping the regionserver 2.The Split Transaction failed updating META and it will retry 3.The retry num is 100, and the total time is about 50 mins as default; This configuration is set by HConnectionManager#setServerSideHConnectionRetries I think 50 mins is too long to acceptable, my suggested solution is closing MetaTable regions after the compact/split thread is closed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6713) Stopping META/ROOT RS may take 50mins when some region is splitting
[ https://issues.apache.org/jira/browse/HBASE-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-6713. -- Resolution: Fixed Stopping META/ROOT RS may take 50mins when some region is splitting --- Key: HBASE-6713 URL: https://issues.apache.org/jira/browse/HBASE-6713 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.1 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.2 Attachments: 6713.92-94, 6713v3.patch, HBASE-6713.patch, HBASE-6713v2.patch When we stop the RS carrying ROOT/META, if it is in the splitting for some region, the whole stopping process may take 50 mins. The reason is : 1.ROOT/META region is closed when stopping the regionserver 2.The Split Transaction failed updating META and it will retry 3.The retry num is 100, and the total time is about 50 mins as default; This configuration is set by HConnectionManager#setServerSideHConnectionRetries I think 50 mins is too long to acceptable, my suggested solution is closing MetaTable regions after the compact/split thread is closed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452491#comment-13452491 ] Jean-Daniel Cryans commented on HBASE-6165: --- [~whitingj], originally replication was using the normal handlers and was just deadlocking the clusters in a different way. ReplicationSink uses the HBase client which can block for ungodly amounts of time so it would fill up the handlers and the RS would stop serving requests. HBASE-6550 changed the latter that a bit by setting low timeouts via replication-specific client-side configuration parameters (if it was using the normal client configurations it would also affect all the other clients). With HBASE-6165 it's even safer since replication is sandboxed. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6398) Print a warning if there is no local datanode
[ https://issues.apache.org/jira/browse/HBASE-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452494#comment-13452494 ] Lars Hofhansl commented on HBASE-6398: -- Yeah, I added all of our folks a while back :) Print a warning if there is no local datanode - Key: HBASE-6398 URL: https://issues.apache.org/jira/browse/HBASE-6398 Project: HBase Issue Type: Improvement Reporter: Elliott Clark Assignee: Sameer Vaishampayan Labels: noob When starting up a RS HBase should print out a warning if there is no datanode locally. Lots of optimizations are only available if the data is machine local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452496#comment-13452496 ] stack commented on HBASE-6752: -- Makes sense. Sounds great. How we know what regionserver to give a log split too when the log has edits for all regions that were on a regionserver. You thinking we could give all regions on the crashed regionserver to a particular regionserver? On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6698) Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation
[ https://issues.apache.org/jira/browse/HBASE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452495#comment-13452495 ] Hadoop QA commented on HBASE-6698: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544525/HBASE-6698_6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2843//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2843//console This message is automatically generated. Refactor checkAndPut and checkAndDelete to use doMiniBatchMutation -- Key: HBASE-6698 URL: https://issues.apache.org/jira/browse/HBASE-6698 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-6698_1.patch, HBASE-6698_2.patch, HBASE-6698_3.patch, HBASE-6698_5.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698_6.patch, HBASE-6698.patch Currently the checkAndPut and checkAndDelete api internally calls the internalPut and internalDelete. May be we can just call doMiniBatchMutation only. This will help in future like if we have some hooks and the CP handles certain cases in the doMiniBatchMutation the same can be done while doing a put thro checkAndPut or while doing a delete thro checkAndDelete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas
[ https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452502#comment-13452502 ] Lars Hofhansl commented on HBASE-6066: -- Had a super brief look at the patch in phabricator. Looks fine the issue I raised above is addressed by checking whether outResults is empty in RegionScannerImpl.next(...) and creating a temporary array if not. The change might introduce new performance issues, though. I'll comment on the diff. some low hanging read path improvement ideas - Key: HBASE-6066 URL: https://issues.apache.org/jira/browse/HBASE-6066 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Michal Gregorczyk Priority: Critical Labels: noob Attachments: metric-stringbuilder-fix.patch I was running some single threaded scan performance tests for a table with small sized rows that is fully cached. Some observations... We seem to be doing several wasteful iterations over and/or building of temporary lists. 1) One such is the following code in HRegionServer.next(): {code} boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE); if (!values.isEmpty()) { for (KeyValue kv : values) { -- wasteful in most cases currentScanResultSize += kv.heapSize(); } results.add(new Result(values)); {code} By default the maxScannerResultSize is Long.MAX_VALUE. In those cases, we can avoid the unnecessary iteration to compute currentScanResultSize. 2) An example of a wasteful temporary array, is results in RegionScanner.next(). {code} results.clear(); boolean returnResult = nextInternal(limit, metric); outResults.addAll(results); {code} results then gets copied over to outResults via an addAll(). Not sure why we can not directly collect the results in outResults. 3) Another almost similar exmaple of a wasteful array is results in StoreScanner.next(), which eventually also copies its results into outResults. 4) Reduce overhead of size metric maintained in StoreScanner.next(). {code} if (metric != null) { HRegion.incrNumericMetric(this.metricNamePrefix + metric, copyKv.getLength()); } results.add(copyKv); {code} A single call to next() might fetch a lot of KVs. We can first add up the size of those KVs in a local variable and then in a finally clause increment the metric one shot, rather than updating AtomicLongs for each KV. 5) RegionScanner.next() calls a helper RegionScanner.next() on the same object. Both are synchronized methods. Synchronized methods calling nested synchronized methods on the same object are probably adding some small overhead. The inner next() calls isFilterDone() which is a also a synchronized method. We should factor the code to avoid these nested synchronized methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6725) Check and Put can fail when using locks
[ https://issues.apache.org/jira/browse/HBASE-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452513#comment-13452513 ] Nicolas ThiƩbaud commented on HBASE-6725: - Done. Check and Put can fail when using locks --- Key: HBASE-6725 URL: https://issues.apache.org/jira/browse/HBASE-6725 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Environment: tested on mac os x Reporter: Nicolas ThiƩbaud Attachments: CAPwithLocks.zip, TestCase_HBASE_6725.patch When multiple threads race using CAP with and a lock on the same row, several instances may be allowed to update the cell with the new value (although the expected value is different). If all threads race with a wrong expected value and a lock, none will be able to update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6476) Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent
[ https://issues.apache.org/jira/browse/HBASE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452516#comment-13452516 ] Lars Hofhansl commented on HBASE-6476: -- We can also add a new method to EnvEdge called WallClockTime (or something). That way all code will only refer to EnvEdge and will also we very explicit about whether wall clock time is needed to time tests or this is a controllable time. Replace all occurrances of System.currentTimeMillis() with EnvironmentEdge equivalent - Key: HBASE-6476 URL: https://issues.apache.org/jira/browse/HBASE-6476 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.96.0 Attachments: 6476.txt, 6476-v2.txt, 6476-v2.txt There are still some areas where System.currentTimeMillis() is used in HBase. In order to make all parts of the code base testable and (potentially) to be able to configure HBase's notion of time, this should be generally be replaced with EnvironmentEdgeManager.currentTimeMillis(). How hard would it be to add a maven task that checks for that, so we do not introduce System.currentTimeMillis back in the future? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3869) RegionServer metrics - add read and write byte-transfer statistics
[ https://issues.apache.org/jira/browse/HBASE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452518#comment-13452518 ] Lars Hofhansl commented on HBASE-3869: -- There is a global metric that accumulates the size of all KVs read by a region server, so it's not what you meant, I guess. RegionServer metrics - add read and write byte-transfer statistics -- Key: HBASE-3869 URL: https://issues.apache.org/jira/browse/HBASE-3869 Project: HBase Issue Type: Improvement Reporter: Doug Meil Priority: Minor It would be beneficial to have the data transfer weight of reads and writes per region server. HBASE-3647 split out the read/write metric requests from the uber-request metric - which is great. But there isn't a notion of data transfer weight and this is why it's important: the read metrics are effectively RPC-based. Thus, with a scan caching of 500, there is 1 RPC call every 500 rows read (and 1 'read' metric increment). And this metric doesn't indicate how much data is being transferred (e.g., a read with 50 attributes will probably cost a lot more than a read with 5 attributes). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6533) [replication] replication will block if WAL compress set differently in master and slave configuration
[ https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452519#comment-13452519 ] Jean-Daniel Cryans commented on HBASE-6533: --- [~terry_zhang] your solution is not complete, see https://issues.apache.org/jira/browse/HBASE-5778?focusedCommentId=13253995page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13253995 [replication] replication will block if WAL compress set differently in master and slave configuration -- Key: HBASE-6533 URL: https://issues.apache.org/jira/browse/HBASE-6533 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.94.0 Reporter: terry zhang Assignee: terry zhang Priority: Critical Fix For: 0.94.3 Attachments: hbase-6533.patch as we know in hbase 0.94.0 we have a configuration below property namehbase.regionserver.wal.enablecompression/name valuetrue/value /property if we enable it in master cluster and disable it in slave cluster . Then replication will not work. It will throw unwrapRemoteException again and again in master cluster. 2012-08-09 12:49:55,892 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of an error on the remote cluster: java.io.IOException: IPC server unable to read call parameters: Error in readFields at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read call parameters: Error in readFields at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151) at $Proxy13.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616) ... 1 more This is because Slave cluster can not parse the hlog entry . 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client 10.232.98.89 java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254) at org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146) at org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682) ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452520#comment-13452520 ] Lars Hofhansl commented on HBASE-6720: -- Hmm... Maybe this is a none-issue now. I'm happy to close as won't fix. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452523#comment-13452523 ] Ted Yu commented on HBASE-6720: --- That's fine. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452525#comment-13452525 ] Elliott Clark commented on HBASE-6720: -- The above code won't fix the described issue if regions are moving quickly but the load balancer is moving a lot. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-6720: - Resolution: Won't Fix Status: Resolved (was: Patch Available) My bad. I thought this still needed fixing. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452544#comment-13452544 ] Elliott Clark commented on HBASE-6720: -- [~lhofhansl] Sorry what I wrote was unclear. I meant to say: The code Ted posted won't stop the load balancer from moving lots of regions if the region moves happen quickly. cutoffTime is some time way in the future assuming that things are going normally. So since applying the plans is usually very quick. All of the plans will be applied. If anything I would stopping the LoadBalancer from moving after a certain number of regions is more useful than stopping it after a number of milliseconds. It's easier to for users to know a good starting value that applies to their cluster. There's no way I could tell you right now how long a balance on our cluster takes when a RegionServer goes down. However I could say the max number of regions that I would want to move at a time. Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6752) On region server failure, serve writes and timeranged reads during the log split
[ https://issues.apache.org/jira/browse/HBASE-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452549#comment-13452549 ] Kannan Muthukkaruppan commented on HBASE-6752: -- There might be a bunch of nitty gritties to be ironed out-- but being able to take writes nearly all the time would be a very nice win. So big +1 for exploring this effort. Will throw out a few things that come to mind: * We do want the old edits to come in the correct order of sequence ids (i.e. be considered older than the newer puts that arrive when the region is in recovery mode, correct)? So, we somehow need to cheaply find the correct sequence id to use for the new puts. It needs to be bigger than sequence ids for all the edits for that region in the log files. So maybe all that's needed here is to open recover the latest log file, and scan it to find the last sequence id? * Picking a winner among duplicates in two files relies on using sequence id of the HFile as a tie-break. And therefore, today, compactions always pick a dense subrange of files order by sequence ids. That is if we have HFiles a, b, c, d, e sorted by sequence id, we might compact a,b,c or c,d,e but never say a,d,e. With this new scheme, we should take care that we don't violate this property. The old data should correctly be recovered into HFiles with the correct sequence id.. and even if newer data has been flushed before the recovery is complete we shouldn't compact those newer files with older HFiles given that some new files are supposed to come in between (after recovery). On region server failure, serve writes and timeranged reads during the log split Key: HBASE-6752 URL: https://issues.apache.org/jira/browse/HBASE-6752 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: nkeywal Priority: Minor Opening for write on failure would mean: - Assign the region to a new regionserver. It marks the region as recovering -- specific exception returned to the client when we cannot server. -- allow them to know where they stand. The exception can include some time information (failure stated on: ...) -- allow them to go immediately on the right regionserver, instead of retrying or calling the region holding meta to get the new address = save network calls, lower the load on meta. - Do the split as today. Priority is given to region server holding the new regions -- help to share the load balancing code: the split is done by region server considered as available for new regions -- help locality (the recovered edits are available on the region server) = lower the network usage - When the split is finished, we're done as of today - while the split is progressing, the region server can -- serve writes --- that's useful for all application that need to write but not read immediately: --- whatever logs events to analyze them later --- opentsdb is a perfect example. -- serve reads if they have a compatible time range. For heavily used tables, it could be an help, because: --- we can expect to have a few minutes of data only (as it's loaded) --- the heaviest queries, often accepts a few -or more- minutes delay. Some What if: 1) the split fails = Retry until it works. As today. Just that we serves writes. We need to know (as today) that the region has not recovered if we fail again. 2) the regionserver fails during the split = As 1 and as of today/ 3) the regionserver fails after the split but before the state change to fully available. = New assign. More logs to split (the ones already dones and the new ones). 4) the assignment fails = Retry until it works. As today. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6720) Optionally limit number of regions balanced in each balancer run
[ https://issues.apache.org/jira/browse/HBASE-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452551#comment-13452551 ] Lars Hofhansl commented on HBASE-6720: -- Hey Elliot... We had comment crossing. :) Ok, so I take it, the patch is good after all, and we should apply (with the move of the constant as Stack suggests). (Didn't spend much time actually looking at the code... Should probably do so). Optionally limit number of regions balanced in each balancer run Key: HBASE-6720 URL: https://issues.apache.org/jira/browse/HBASE-6720 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.3 Attachments: 6720-0.96-v1.txt See discussion on HBASE-3866 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452552#comment-13452552 ] Jeff Whiting commented on HBASE-6165: - @stack and @jdcryans Thanks for the explanation. I can see how it would deadlock on itself. I also found HBASE-3401 which talks about the deadlock. We patched our cdh4 cluster with HBASE-6724 and it has been running much smoother. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452557#comment-13452557 ] Himanshu Vashishtha commented on HBASE-6165: [~whitingj] Specifically, replication specific jira about deadlocking on normal handlers is HBASE-4280. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1]
[ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452581#comment-13452581 ] Jean-Daniel Cryans commented on HBASE-6649: --- bq. This is because of multiple calls to reader.next within readAllEntriesToReplicateOrNextFile. If the second call (within the while loop) throws an exception (like EOFException), it basically destroys the work done up until then. Therefore, some rows would never be replicated. The position in the log is updated in ZK only once the edits are replicated hence, even if you fail on the second or hundredth edit, the next region server that will be in charge of that log will pick up where the previous RS was (even if that means re-reading some edits). [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-1] --- Key: HBASE-6649 URL: https://issues.apache.org/jira/browse/HBASE-6649 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.96.0, 0.92.3, 0.94.2 Attachments: 6649-0.92.patch, 6649-1.patch, 6649-2.txt, 6649-trunk.patch, 6649-trunk.patch, 6649.txt, HBase-0.92 #495 test - queueFailover [Jenkins].html, HBase-0.92 #502 test - queueFailover [Jenkins].html Have seen it twice in the recent past: http://bit.ly/MPCykB http://bit.ly/O79Dq7 .. Looking briefly at the logs hints at a pattern - in both the failed test instances, there was an RS crash while the test was running. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6733: --- Attachment: 6733-1.patch Attached is a simple patch (for trunk) to take care of the first problem (described in the description of the jira). The patch resets the sleepMultiplier in every place where currentPath gets (re)assigned. Could we please commit this one. I am looking at the other issue but seems like it will take more time. [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6241) HBaseCluster interface for interacting with the cluster from system tests
[ https://issues.apache.org/jira/browse/HBASE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452610#comment-13452610 ] Enis Soztutar commented on HBASE-6241: -- Thanks Stack for the review. I'll put together smt to go over for tomorrow. Let's finalize the patch, and continue building upon it. HBaseCluster interface for interacting with the cluster from system tests -- Key: HBASE-6241 URL: https://issues.apache.org/jira/browse/HBASE-6241 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: HBASE-6241_v0.2.patch, HBASE-6241_v1.patch We need to abstract away the cluster interactions for system tests running on actual clusters. MiniHBaseCluster and RealHBaseCluster should both implement this interface, and system tests should work with both. I'll split Devaraj's patch in HBASE-6053 for the initial version. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
[ https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452618#comment-13452618 ] Jieshan Bean commented on HBASE-6748: - From the logs, we can see SessionTimeoutException happened. So the request of deleteNode was rejected each time. DeleteAsyncCallback does not handle the exceptions correctly. Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback --- Key: HBASE-6748 URL: https://issues.apache.org/jira/browse/HBASE-6748 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0, 0.94.1 Reporter: Jieshan Bean Priority: Critical You can ealily understand the problem from the below logs: {code} [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=3 [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=2 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=1 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=0 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775807 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775806 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775805 [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining retries=9223372036854775804 [2012-09-01 11:41:02,065] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846 remaining
[jira] [Updated] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
[ https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-6733: --- Assignee: Devaraj Das Status: Patch Available (was: Open) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2] --- Key: HBASE-6733 URL: https://issues.apache.org/jira/browse/HBASE-6733 Project: HBase Issue Type: Bug Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.92.3 Attachments: 6733-1.patch The failure is in TestReplication.queueFailover (fails due to unreplicated rows). I have come across two problems: 1. The sleepMultiplier is not properly reset when the currentPath is changed (in ReplicationSource.java). 2. ReplicationExecutor sometime removes files to replicate from the queue too early, resulting in corresponding edits missing. Here the problem is due to the fact the log-file length that the replication executor finds is not the most updated one, and hence it doesn't read anything from there, and ultimately, when there is a log roll, the replication-queue gets a new entry, and the executor drops the old entry out of the queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6260) balancer state should be stored in ZK
[ https://issues.apache.org/jira/browse/HBASE-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-6260: -- Attachment: HBASE-6260.patch * Attached HBASE-6260.patch * I haven't run this through the test suite yet, let's see what HadoopQA says. - Master reads balanceSwitch state from ZK - Adds two test cases: 1. If balancer is on and master dies, and new master takes over, balancer is still running 2. 1. If balancer is off and master dies, and new master takes over, balancer is still not running balancer state should be stored in ZK - Key: HBASE-6260 URL: https://issues.apache.org/jira/browse/HBASE-6260 Project: HBase Issue Type: Task Components: master, zookeeper Affects Versions: 0.96.0 Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Blocker Attachments: HBASE-6260.patch See: https://issues.apache.org/jira/browse/HBASE-5953?focusedCommentId=13270200page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13270200 And: https://issues.apache.org/jira/browse/HBASE-5630?focusedCommentId=13399225page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13399225 In short, we need to move the balancer state to ZK so that it won't have to be restarted if the master dies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira