[jira] [Updated] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers
[ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4335: - Attachment: 4335-v3.txt New patch. Breaks SplitTransaction.execute into three parts. In part to make the phases clear, in part so that a test can test each of the phases independently. Also added a test. The test uses phaseI and phaseIII directly and mocks a bit with phaseII (that's the one that bring the daughters online and updates .META.) I could validate that if I change the order back to what is was before this patch the client would indeed reach the wrong region if querying past the split key and would (before HBASE-4334) silently return an empty result set. Let me know what you think about this change. TestSplitTransaction and the new TestEndToEndSplitTransaction pass. Splits can create temporary holes in .META. that confuse clients and regionservers -- Key: HBASE-4335 URL: https://issues.apache.org/jira/browse/HBASE-4335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Joe Pallas Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0 Attachments: 4335-v2.txt, 4335-v3.txt, 4335.txt When a SplitTransaction is performed, three updates are done to .META.: 1. The parent region is marked as splitting (and hence offline) 2. The first daughter region is added (same start key as parent) 3. The second daughter region is added (split key is start key) (later, the original parent region is deleted, but that's not important to this discussion) Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads. While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META. If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region. If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it. If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry. If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter. A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key. See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122574#comment-13122574 ] ramkrishna.s.vasudevan commented on HBASE-4480: --- @Scott Nice one. Thanks a lot. Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh, runtest2.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122585#comment-13122585 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2425 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java https://reviews.apache.org/r/2251/#comment5519 Yes Ted. I too was thinking of unifying both the deleteNode() apis. Was thinking what can the expectedVersion that can be passed when we need not check it. Can we pass -1? and check if -1 is passed for expectedVersion we will skip that check. - ramkrishna On 2011-10-06 17:55:05, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-06 17:55:05) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
[jira] [Updated] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang
[ https://issues.apache.org/jira/browse/HBASE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanbin updated HBASE-4550: -- Fix Version/s: (was: 0.90.4) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang --- Key: HBASE-4550 URL: https://issues.apache.org/jira/browse/HBASE-4550 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: wanbin Original Estimate: 2h Remaining Estimate: 2h when master passed regionserver different address, regionserver didn't create new zookeeper znode, master store new address in ServerManager, when call stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, serverManager.expireServer is not be called. so stop-hbase.sh is hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang
[ https://issues.apache.org/jira/browse/HBASE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wanbin updated HBASE-4550: -- Attachment: patch When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang --- Key: HBASE-4550 URL: https://issues.apache.org/jira/browse/HBASE-4550 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: wanbin Attachments: patch Original Estimate: 2h Remaining Estimate: 2h when master passed regionserver different address, regionserver didn't create new zookeeper znode, master store new address in ServerManager, when call stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, serverManager.expireServer is not be called. so stop-hbase.sh is hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang
[ https://issues.apache.org/jira/browse/HBASE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122715#comment-13122715 ] wanbin commented on HBASE-4550: --- I fixed this problem, somebody can check it. thanks. When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang --- Key: HBASE-4550 URL: https://issues.apache.org/jira/browse/HBASE-4550 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: wanbin Attachments: patch Original Estimate: 2h Remaining Estimate: 2h when master passed regionserver different address, regionserver didn't create new zookeeper znode, master store new address in ServerManager, when call stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, serverManager.expireServer is not be called. so stop-hbase.sh is hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122829#comment-13122829 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/ --- (Updated 2011-10-07 14:27:20.231903) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- LOG.debug(zkw.prefix(Successfully deleted unassigned node for region + regionName + in expected state + expectedState)); @Ted - I have not removed this log so that it can be used for debugging. Refactored the testcase and made it much simpler so that it doesn't take much time. Summary --- Fix for handling HBASE-4539 and HBASE-4540. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. Also addresses Ted's comments. This addresses bug HBASE-4540. https://issues.apache.org/jira/browse/HBASE-4540 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179945 Diff: https://reviews.apache.org/r/2251/diff Testing --- Yes Thanks, ramkrishna OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122842#comment-13122842 ] Ted Yu commented on HBASE-4540: --- For Ram's comment @ 07/Oct/11 07:22 Since -1 is a possible return value from ZKAssign methods, I think we should use other values such as -2. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122860#comment-13122860 ] ramkrishna.s.vasudevan commented on HBASE-4540: --- @Ted I just uploaded the patch before you had commented this. In that patch i had used -1. So if we are going to use -2 or some negative value is it ok to add in javadoc something like * @param expectedVersion of the znode that is to be deleted. *If expectedVersion need not be compared while deleting the znode *pass -2(NEGATIVE_VERSION) Is it ok Ted? OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122862#comment-13122862 ] ramkrishna.s.vasudevan commented on HBASE-4540: --- Can we better document like anything less than some value. may be either 0 or -1? Instead of going with one value. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122884#comment-13122884 ] Ted Yu commented on HBASE-4540: --- We may designate some negative value for other purpose in the future. I think using one known value is recommended. The Javadoc addition above is nice. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4547: - Attachment: 4547-part2.txt Need this piece where we test existence before doing delete when updating a file. TestFSTableDescriptors was failing. TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547-part2.txt, 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122920#comment-13122920 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/ --- (Updated 2011-10-07 16:13:33.022073) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- If we do not want to compare the version of znode while deleting we can pass -2 to the deleteNode api. Uploaded the patch with the change. Summary --- Fix for handling HBASE-4539 and HBASE-4540. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. Also addresses Ted's comments. This addresses bug HBASE-4540. https://issues.apache.org/jira/browse/HBASE-4540 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION Diff: https://reviews.apache.org/r/2251/diff Testing --- Yes Thanks, ramkrishna OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122938#comment-13122938 ] Tim Sell commented on HBASE-1744: - Is run the HBase Thrift2 server ok? What do you mean by experience? Do you mean examples of usage in different languages? Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122947#comment-13122947 ] Ted Yu commented on HBASE-1744: --- run the HBase Thrift2 server is fine. By experience, usage in different programming languages would be nice to share. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122957#comment-13122957 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2431 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5524 Space between } and catch, please. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java https://reviews.apache.org/r/2251/#comment5525 Should we expose this constant as public ? How about naming this constant DONT_COMPARE_VERSION or NO_VERSION_COMPARISON ? - Ted On 2011-10-07 16:13:33, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-07 16:13:33) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122973#comment-13122973 ] Ted Yu commented on HBASE-4540: --- @Ramkrishna: ZKAssign.transitionNode() is already using -1 to indicate no version comparison. Your patch @ 07/Oct/11 14:27 should be good. Sorry for the confusion. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers
[ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122974#comment-13122974 ] Jonathan Hsieh commented on HBASE-4335: --- @Lars some nits / suggestions on v3. TestEndToEndSplitTransaction needs license. Maybe a more descriptive function names for phaseI, phaseII, phaseIII? Any reason for the (overly?) general Class... instead of just taking a single Class and checking for null when no exceptions expected? Or maybe just make 'test' return boolean and assertTrue/assertFalse? Splits can create temporary holes in .META. that confuse clients and regionservers -- Key: HBASE-4335 URL: https://issues.apache.org/jira/browse/HBASE-4335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Joe Pallas Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0 Attachments: 4335-v2.txt, 4335-v3.txt, 4335.txt When a SplitTransaction is performed, three updates are done to .META.: 1. The parent region is marked as splitting (and hence offline) 2. The first daughter region is added (same start key as parent) 3. The second daughter region is added (split key is start key) (later, the original parent region is deleted, but that's not important to this discussion) Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads. While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META. If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region. If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it. If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry. If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter. A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key. See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122992#comment-13122992 ] ramkrishna.s.vasudevan commented on HBASE-4540: --- If any node exists the version will start from 0. Thanks Ted for the confirmation. I will wait for one day for further reviews and will make changes accordingly if not will take the patc at @ 07/Oct/11 14:27. The space between catch and } i will take care. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122994#comment-13122994 ] Ted Yu commented on HBASE-4536: --- If the number of columns for the underlying family is not huge, the option of translating family delete marker is attractive. Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4333) Client does not check for holes in .META.
[ https://issues.apache.org/jira/browse/HBASE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123032#comment-13123032 ] Joe Pallas commented on HBASE-4333: --- Are you satisfied that this is not an issue for scanners? If so, I'm okay with closing this. Client does not check for holes in .META. - Key: HBASE-4333 URL: https://issues.apache.org/jira/browse/HBASE-4333 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Joe Pallas If there is a temporary hole in .META., the client may get the wrong region from HConnection.locateRegion. HConnectionManager.HConnectionImplementation.locateRegionInMeta should check the end key of the region found with getClosestRowBefore, just as it checks the offline status, when it looks at the region info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated HBASE-1744: Attachment: HBASE-1744.6.patch Added new patch, with a few more tests and a incomplete trivial java example. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123059#comment-13123059 ] jirapos...@reviews.apache.org commented on HBASE-4377: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ --- (Updated 2011-10-07 18:46:54.806909) Review request for hbase, Michael Stack and Andrew Purtell. Changes --- Updates with nits and separated tests into different classes so that we can rely on new jvms to avoid OO file handle errors intermittently encountered when shutting down and restarting mini clusters. Summary --- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh j...@cloudera.com Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506. Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377. https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing --- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster -- it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix -- whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior -- rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123058#comment-13123058 ] jirapos...@reviews.apache.org commented on HBASE-4377: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ --- (Updated 2011-10-07 18:47:01.208741) Review request for hbase, Michael Stack and Andrew Purtell. Summary --- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh j...@cloudera.com Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506. Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377. https://issues.apache.org/jira/browse/HBASE-4377 Diffs - src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing --- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster -- it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix -- whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior -- rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123080#comment-13123080 ] jirapos...@reviews.apache.org commented on HBASE-4377: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ --- Review request for hbase and Ted Yu. Summary --- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh j...@cloudera.com Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377. https://issues.apache.org/jira/browse/HBASE-4377 Diffs - src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing --- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4554) Allow set/unset arbitrary table attributes from shell.
Allow set/unset arbitrary table attributes from shell. -- Key: HBASE-4554 URL: https://issues.apache.org/jira/browse/HBASE-4554 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Mingjie Lai Assignee: Mingjie Lai Fix For: 0.92.0 Table/region level coprocessor -- RegionObserver -- can be configured by setting a HTD's attribute which matches Coprocessor$*. Current shell -- alter -- cannot support to set/unset a table's arbitrary attribute. We need it in order to configure region level coprocessors to a table. Proposed new shell: {code} hbase shell alter 't1', METHOD = 'table_att', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|' hbase shell describe 't1' {NAME = 't1', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|', MAX_FILESIZE = '134217728', ...} hbase shell alter 't1', METHOD = 'table_att_unset', COPROCESSOR$1 hbase shell describe 't1' {NAME = 't1', MAX_FILESIZE = '134217728', ...} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers
[ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123123#comment-13123123 ] Ted Yu commented on HBASE-4335: --- How about calling the first phase createDaughtersPhase, second openDaughtersPhase and the third phase transitionZKNodePhase ? Splits can create temporary holes in .META. that confuse clients and regionservers -- Key: HBASE-4335 URL: https://issues.apache.org/jira/browse/HBASE-4335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Joe Pallas Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0 Attachments: 4335-v2.txt, 4335-v3.txt, 4335.txt When a SplitTransaction is performed, three updates are done to .META.: 1. The parent region is marked as splitting (and hence offline) 2. The first daughter region is added (same start key as parent) 3. The second daughter region is added (split key is start key) (later, the original parent region is deleted, but that's not important to this discussion) Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads. While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META. If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region. If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it. If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry. If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter. A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key. See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4551) Small fixes to compile against 0.23-SNAPSHOT
[ https://issues.apache.org/jira/browse/HBASE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4551: --- Attachment: hbase-4551.txt new revision also removes the places where we set the lease timeouts in the NN to non-standard values. Now that we use the recoverLease API instead of the appendFile API, we don't need to do this. I ran the modified tests and they still pass on 0.20. Small fixes to compile against 0.23-SNAPSHOT Key: HBASE-4551 URL: https://issues.apache.org/jira/browse/HBASE-4551 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4551.txt, hbase-4551.txt - fix pom.xml to properly pull the test artifacts - fix TestHLog to not use the private cluster.getNameNode() API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123175#comment-13123175 ] Hudson commented on HBASE-4547: --- Integrated in HBase-0.92 #51 (See [https://builds.apache.org/job/HBase-0.92/51/]) HBASE-4547 TestAdmin failing in 0.92 because .tableinfo not found stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547-part2.txt, 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123197#comment-13123197 ] jirapos...@reviews.apache.org commented on HBASE-4377: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/#review2440 --- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5546 Minor suggestion: IOException may occur more than once. Would logging all such IOException's before bailing out make user experience better ? Basically we just need to track the last such IOException in a variable and bail out at line 283 if the variable isn't null. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5545 Naming rd as rootdir would make the code more readable. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5548 I think rebuildMeta() should check the return value from generatePuts(). Otherwise we would encounter NPE at line 405 below. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5549 Do you plan to add this logic in another JIRA ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5550 false should be returned if puts is null. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java https://reviews.apache.org/r/2287/#comment5552 I think LOG.info() should be used here. - Ted On 2011-10-07 19:04:44, jmhsieh wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2287/ bq. --- bq. bq. (Updated 2011-10-07 19:04:44) bq. bq. bq. Review request for hbase and Ted Yu. bq. bq. bq. Summary bq. --- bq. bq. Backport to 0.90 bq. bq. commit 89862b73c6358e27220b87b0362599d86ab0fe4a bq. Author: Jonathan Hsieh j...@cloudera.com bq. Date: Wed Sep 28 10:18:11 2011 -0700 bq. bq. HBASE-4377 [hbck] Offline rebuild .META. from fs data only bq. bq. bq. bq. This addresses bug HBASE-4377. bq. https://issues.apache.org/jira/browse/HBASE-4377 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 bq.src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 bq.src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 bq.src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 bq.src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2287/diff bq. bq. bq. Testing bq. --- bq. bq. Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). bq. bq. This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. bq. bq. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. bq. bq. bq. Thanks, bq. bq. jmhsieh bq. bq. [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is
[jira] [Commented] (HBASE-4554) Allow set/unset arbitrary table attributes from shell.
[ https://issues.apache.org/jira/browse/HBASE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123241#comment-13123241 ] Ted Yu commented on HBASE-4554: --- This would be a useful feature. COPROCESSOR$1 needs to be defined as a constant if we follow example from src/main/ruby/hbase/admin.rb: {code} if method == table_att htd.setMaxFileSize(JLong.valueOf(arg[MAX_FILESIZE])) if arg[MAX_FILESIZE] {code} I think table_att method targets known table attributes. For HBASE-4554, we can introduce new method, e.g. table_dyn_att, which accepts two parameters: KEY and VALUE: {code} hbase alter 't1', {METHOD = 'table_dyn_att', KEY = 'COPROCESSOR$1', VALUE = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|' } {code} Allow set/unset arbitrary table attributes from shell. -- Key: HBASE-4554 URL: https://issues.apache.org/jira/browse/HBASE-4554 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Mingjie Lai Assignee: Mingjie Lai Fix For: 0.92.0 Table/region level coprocessor -- RegionObserver -- can be configured by setting a HTD's attribute which matches Coprocessor$*. Current shell -- alter -- cannot support to set/unset a table's arbitrary attribute. We need it in order to configure region level coprocessors to a table. Proposed new shell: {code} hbase shell alter 't1', METHOD = 'table_att', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|' hbase shell describe 't1' {NAME = 't1', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|', MAX_FILESIZE = '134217728', ...} hbase shell alter 't1', METHOD = 'table_att_unset', COPROCESSOR$1 hbase shell describe 't1' {NAME = 't1', MAX_FILESIZE = '134217728', ...} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123254#comment-13123254 ] Ted Yu commented on HBASE-1744: --- src/examples/thrift2/DemoClient.java needs license. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4549) Add thrift API to read version and build date of HBase
[ https://issues.apache.org/jira/browse/HBASE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Song Liu updated HBASE-4549: Attachment: patch-hbase-4549.txt The patch passes the new test in TestThriftServer. Add thrift API to read version and build date of HBase --- Key: HBASE-4549 URL: https://issues.apache.org/jira/browse/HBASE-4549 Project: HBase Issue Type: Improvement Components: thrift Reporter: Song Liu Priority: Minor Attachments: patch-hbase-4549.txt Original Estimate: 2h Remaining Estimate: 2h Adding API to get the hbase server version and build date will be helpful for the client to communicate with different versions of the server accordingly. class VersionInfo can be reused to provide required information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4555) TestShell seems passed, but actually errors seen in test output file
TestShell seems passed, but actually errors seen in test output file Key: HBASE-4555 URL: https://issues.apache.org/jira/browse/HBASE-4555 Project: HBase Issue Type: Test Components: test Reporter: Mingjie Lai When I was making test cases for 4554, I saw a weird issue that TestShell seems to pass, but actually I saw error messages in the output file. {code} --- T E S T S --- Running org.apache.hadoop.hbase.client.TestShell Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.252 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 {code} Error messages in org.apache.hadoop.hbase.client.TestShell-output.txt: {code} ... 6) Error: test_alter_should_support_shortcut_DELETE_alter_specs(Hbase::AdminAlterTableTest): ArgumentError: There should be at least one argument but the table name /home/mlai/git/hbase-private/src/test/ruby/../../main/ruby/hbase/admin.rb:307:in `alter' ./src/test/ruby/hbase/admin_test.rb:271:in `test_alter_should_support_shortcut_DELETE_alter_specs' org/jruby/RubyProc.java:268:in `call' org/jruby/RubyKernel.java:2038:in `send' org/jruby/RubyArray.java:1572:in `each' org/jruby/RubyArray.java:1572:in `each' 7) Error: test_split_should_work(Hbase::AdminMethodsTest): ArgumentError: wrong number of arguments (1 for 2) ./src/test/ruby/hbase/admin_test.rb:99:in `test_split_should_work' org/jruby/RubyProc.java:268:in `call' org/jruby/RubyKernel.java:2038:in `send' org/jruby/RubyArray.java:1572:in `each' org/jruby/RubyArray.java:1572:in `each' 192 tests, 259 assertions, 1 failures, 6 errors Done with tests! Shutting down the cluster... 2011-10-07 16:46:14,760 INFO [main] hbase.HBaseTestingUtility(551): Shutting down minicluster 2011-10-07 16:46:14,760 DEBUG [main] util.JVMClusterUtil(214): Shutting down HBase Cluster {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4554) Allow set/unset coprocessor table attributes from shell.
[ https://issues.apache.org/jira/browse/HBASE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingjie Lai updated HBASE-4554: --- Summary: Allow set/unset coprocessor table attributes from shell. (was: Allow set/unset arbitrary table attributes from shell.) Rename the jira title from ``Allow set/unset arbitrary table attributes from shell.'' ``Allow set/unset coprocessor table attributes from shell.''. Allow set/unset coprocessor table attributes from shell. Key: HBASE-4554 URL: https://issues.apache.org/jira/browse/HBASE-4554 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Mingjie Lai Assignee: Mingjie Lai Fix For: 0.92.0 Table/region level coprocessor -- RegionObserver -- can be configured by setting a HTD's attribute which matches Coprocessor$*. Current shell -- alter -- cannot support to set/unset a table's arbitrary attribute. We need it in order to configure region level coprocessors to a table. Proposed new shell: {code} hbase shell alter 't1', METHOD = 'table_att', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|' hbase shell describe 't1' {NAME = 't1', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|', MAX_FILESIZE = '134217728', ...} hbase shell alter 't1', METHOD = 'table_att_unset', COPROCESSOR$1 hbase shell describe 't1' {NAME = 't1', MAX_FILESIZE = '134217728', ...} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4554) Allow set/unset coprocessor table attributes from shell.
[ https://issues.apache.org/jira/browse/HBASE-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123315#comment-13123315 ] Mingjie Lai commented on HBASE-4554: @Ted. I changed the title so this jira only deals with cp related htd attributes (no more arbitrary). I still prefer to utilizing the existing table_att method so we can add/change multiple attributes at one time. {code} alter 't1', METHOD = 'table_att', 'COPROCESSOR$1' = 'cp1', 'COPROCESSOR$2' = 'cp2' {code} Code change would be something like: {code} --- a/src/main/ruby/hbase/admin.rb +++ b/src/main/ruby/hbase/admin.rb @@ -359,6 +359,16 @@ module Hbase htd.setReadOnly(JBoolean.valueOf(arg[READONLY])) if arg[READONLY] htd.setMemStoreFlushSize(JLong.valueOf(arg[MEMSTORE_FLUSHSIZE])) if arg[MEMSTORE_FLUSHSIZE] htd.setDeferredLogFlush(JBoolean.valueOf(arg[DEFERRED_LOG_FLUSH])) if arg[DEFERRED_LOG_FLUSH] + + # set a coprocessor attribute + if arg.kind_of?(Hash) +arg.each do |key, value| + k = String.new(key) # prepare to strip + k.strip! + htd.setValue(k, value) if (k =~ /coprocessor\$[0-9]*/i) +end + end + {code} Allow set/unset coprocessor table attributes from shell. Key: HBASE-4554 URL: https://issues.apache.org/jira/browse/HBASE-4554 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Mingjie Lai Assignee: Mingjie Lai Fix For: 0.92.0 Table/region level coprocessor -- RegionObserver -- can be configured by setting a HTD's attribute which matches Coprocessor$*. Current shell -- alter -- cannot support to set/unset a table's arbitrary attribute. We need it in order to configure region level coprocessors to a table. Proposed new shell: {code} hbase shell alter 't1', METHOD = 'table_att', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|' hbase shell describe 't1' {NAME = 't1', COPROCESSOR$1 = 'hdfs://cp/foo.jar|org.apache.hadoop.hbase.sample|1|', MAX_FILESIZE = '134217728', ...} hbase shell alter 't1', METHOD = 'table_att_unset', COPROCESSOR$1 hbase shell describe 't1' {NAME = 't1', MAX_FILESIZE = '134217728', ...} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123330#comment-13123330 ] jirapos...@reviews.apache.org commented on HBASE-4218: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2308/ --- Review request for hbase. Summary --- Delta encoding for key values. This addresses bug HBASE-4218. https://issues.apache.org/jira/browse/HBASE-4218 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BitsetKeyDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CompressionState.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CopyKeyDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncodedBlock.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderAlgorithms.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderToSmallBufferException.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DiffKeyDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/FastDiffDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/EmptyBlockDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockDeltaEncoder.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 1180113 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java 1180113
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Migdal updated HBASE-4218: Affects Version/s: 0.94.0 Status: Patch Available (was: Open) Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Migdal updated HBASE-4218: Status: Open (was: Patch Available) Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Migdal updated HBASE-4218: Attachment: open-source.diff Delta encoding source code. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression Attachments: open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123337#comment-13123337 ] Jacek Migdal commented on HBASE-4218: - Performance results on production data. CopyKeyDeltaEncoder: Compression performance: 1136.33 MB/s (+/- 60.91 MB/s) Decompression performance: 373.29 MB/s (+/- 281.22 MB/s) BitsetKeyDeltaEncoder: Compression performance: 147.57 MB/s (+/- 0.58 MB/s) Decompression performance: 166.78 MB/s (+/- 54.81 MB/s) PrefixKeyDeltaEncoder: Compression performance: 293.94 MB/s (+/- 1.97 MB/s) Decompression performance: 233.61 MB/s (+/- 91.97 MB/s) FastDiffDeltaEncoder: Compression performance: 203.47 MB/s (+/- 0.37 MB/s) Decompression performance: 196.77 MB/s (+/- 43.22 MB/s) DiffKeyDeltaEncoder: Compression performance: 187.74 MB/s (+/- 0.24 MB/s) Decompression performance: 163.13 MB/s (+/- 12.17 MB/s) LZO: Compression performance: 260.35 MB/s (+/- 0.76 MB/s) Decompression performance: 173.45 MB/s (+/- 76.13 MB/s) CopyKeyDeltaEncoder Saved bytes: -4 Key compression ratio:-0.00 % All compression ratio:-0.00 % LZO compressed size: 152019 LZO compression ratio:85.79 % BitsetKeyDeltaEncoder Saved bytes: 747061 Key compression ratio:75.46 % All compression ratio:69.82 % LZO compressed size: 124438 LZO compression ratio:88.37 % PrefixKeyDeltaEncoder Saved bytes: 831602 Key compression ratio:84.00 % All compression ratio:77.72 % LZO compressed size: 117285 LZO compression ratio:89.04 % FastDiffDeltaEncoder Saved bytes: 935275 Key compression ratio:94.47 % All compression ratio:87.41 % LZO compressed size: 94360 LZO compression ratio:91.18 % DiffKeyDeltaEncoder Saved bytes: 909175 Key compression ratio:91.84 % All compression ratio:84.97 % LZO compressed size: 96597 LZO compression ratio:90.97 % Total KV prefix length: 8 Total key length: 91 Total key redundancy: 781606 Total value length: 8 DeltaEncodingSeekPerformance BlockDeltaEncoder onDisk='NONE' inCache='NONE' inMemory=false Read speed: 63.99 (MB/s) Seeks per second: 54901.21 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='BITSET' inMemory=false Read speed: 46.73 (MB/s) Seeks per second: 13570.50 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='PREFIX' inMemory=false Read speed: 55.88 (MB/s) Seeks per second: 20298.89 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='DIFF' inMemory=false Read speed: 54.39 (MB/s) Seeks per second: 15082.79 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='FAST_DIFF' inMemory=false Read speed: 54.12 (MB/s) Seeks per second: 15432.61 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='NONE' inMemory=true Read speed: 64.37 (MB/s) Seeks per second: 56779.82 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='BITSET' inMemory=true Read speed: 35.42 (MB/s) Seeks per second: 46170.87 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='PREFIX' inMemory=true Read speed: 43.54 (MB/s) Seeks per second: 60108.48 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='DIFF' inMemory=true Read speed: 40.62 (MB/s) Seeks per second: 48779.68 (#/s) BlockDeltaEncoder onDisk='NONE' inCache='FAST_DIFF' inMemory=true Read speed: 40.76 (MB/s) Seeks per second: 57291.22 (#/s) Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression Attachments: open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80%
[jira] [Commented] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers
[ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123342#comment-13123342 ] Lars Hofhansl commented on HBASE-4335: -- I like those names. Will do. @Jon. Initially i expected multiple different exceptions to thrown hence the general class approach. You're right here it does not make sense. Splits can create temporary holes in .META. that confuse clients and regionservers -- Key: HBASE-4335 URL: https://issues.apache.org/jira/browse/HBASE-4335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Joe Pallas Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0 Attachments: 4335-v2.txt, 4335-v3.txt, 4335.txt When a SplitTransaction is performed, three updates are done to .META.: 1. The parent region is marked as splitting (and hence offline) 2. The first daughter region is added (same start key as parent) 3. The second daughter region is added (split key is start key) (later, the original parent region is deleted, but that's not important to this discussion) Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads. While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META. If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region. If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it. If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry. If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter. A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key. See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master
[ https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123354#comment-13123354 ] jirapos...@reviews.apache.org commented on HBASE-4070: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2029/#review2396 --- src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java https://reviews.apache.org/r/2029/#comment5499 Add comment like these three declarations are only used by testRegionServerCoprocessorsReported src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java https://reviews.apache.org/r/2029/#comment5500 Add comment this declaration is only used by testMasterCoprocessorsReported. - Eugene On 2011-10-05 21:45:30, Eugene Koontz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2029/ bq. --- bq. bq. (Updated 2011-10-05 21:45:30) bq. bq. bq. Review request for hbase and Mingjie Lai. bq. bq. bq. Summary bq. --- bq. bq. Proposed fix for HBASE-4070. bq. bq. bq. This addresses bug HBASE-4070. bq. https://issues.apache.org/jira/browse/HBASE-4070 bq. bq. bq. Diffs bq. - bq. bq.src/main/jamon/org/apache/hbase/tmpl/master/MasterStatusTmpl.jamon abeb850 bq.src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon be6fceb bq.src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 01bc1dd bq.src/main/java/org/apache/hadoop/hbase/HServerLoad.java 0c680e4 bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java dbae4fd bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3840279 bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java eda5a9b bq. bq. Diff: https://reviews.apache.org/r/2029/diff bq. bq. bq. Testing bq. --- bq. bq. Two new tests : testRegionServerCoprocessorReported() and testMasterServerCoprocessorsReported() included in a new source file src/test/java/o.a.h.h/coprocessor/TestCoprocessorReporting.java. bq. bq. bq. Thanks, bq. bq. Eugene bq. bq. [Coprocessors] Improve region server metrics to report loaded coprocessors to master Key: HBASE-4070 URL: https://issues.apache.org/jira/browse/HBASE-4070 Project: HBase Issue Type: Improvement Affects Versions: 0.90.3 Reporter: Mingjie Lai Assignee: Eugene Koontz Attachments: HBASE-4070.patch, HBASE-4070.patch, HBASE-4070.patch, master-web-ui.jpg, rs-status-web-ui.jpg HBASE-3512 is about listing loaded cp classes at shell. To make it more generic, we need a way to report this piece of information from region to master (or just at region server level). So later on, we can display the loaded class names at shell as well as web console. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4335) Splits can create temporary holes in .META. that confuse clients and regionservers
[ https://issues.apache.org/jira/browse/HBASE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4335: - Attachment: 4335-v4.txt Renamed phaseX methods addressed nits... Splits can create temporary holes in .META. that confuse clients and regionservers -- Key: HBASE-4335 URL: https://issues.apache.org/jira/browse/HBASE-4335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Reporter: Joe Pallas Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0 Attachments: 4335-v2.txt, 4335-v3.txt, 4335-v4.txt, 4335.txt When a SplitTransaction is performed, three updates are done to .META.: 1. The parent region is marked as splitting (and hence offline) 2. The first daughter region is added (same start key as parent) 3. The second daughter region is added (split key is start key) (later, the original parent region is deleted, but that's not important to this discussion) Steps 2 and 3 are actually done concurrently by SplitTransaction.DaughterOpener threads. While the master is notified when a split is complete, the only visibility that clients have is whether the daughter regions have appeared in .META. If the second daughter is added to .META. first, then .META. will contain the (offline) parent region followed by the second daughter region. If the client looks up a key that is greater than (or equal to) the split, the client will find the second daughter region and use it. If the key is less than the split key, the client will find the parent region and see that it is offline, triggering a retry. If the first daughter is added to .META. before the second daughter, there is a window during which .META. has a hole: the first daughter effectively hides the parent region (same start key), but there is no entry for the second daughter. A region lookup will find the first daughter for all keys in the parent's range, but the first daughter does not include keys at or beyond the split key. See HBASE-4333 and HBASE-4334 for details on how this causes problems and suggestions for mitigating this in the client and regionserver. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4333) Client does not check for holes in .META.
[ https://issues.apache.org/jira/browse/HBASE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123360#comment-13123360 ] Lars Hofhansl commented on HBASE-4333: -- The standard ClientScanner will do the right thing. I.e. scan until the end of the region and then use that last key to find the next region. The scan starting at the next region will then fail/retry. Client does not check for holes in .META. - Key: HBASE-4333 URL: https://issues.apache.org/jira/browse/HBASE-4333 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Joe Pallas If there is a temporary hole in .META., the client may get the wrong region from HConnection.locateRegion. HConnectionManager.HConnectionImplementation.locateRegionInMeta should check the end key of the region found with getClosestRowBefore, just as it checks the offline status, when it looks at the region info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-4488: -- Reopening for the related change to Store.compactStore Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123366#comment-13123366 ] jirapos...@reviews.apache.org commented on HBASE-4218: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2308/#review2460 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java https://reviews.apache.org/r/2308/#comment5565 Should be 'bytes are required' http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java https://reviews.apache.org/r/2308/#comment5564 The value of i should be included in the exception. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java https://reviews.apache.org/r/2308/#comment5566 Can this logic be written without recursion ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java https://reviews.apache.org/r/2308/#comment5567 Should this exception be called DeltaEncoderBufferTooSmallException ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java https://reviews.apache.org/r/2308/#comment5568 Would arePartsEqual be a better name ? - Ted On 2011-10-08 00:51:01, Jacek Migdal wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2308/ bq. --- bq. bq. (Updated 2011-10-08 00:51:01) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Delta encoding for key values. bq. bq. bq. This addresses bug HBASE-4218. bq. https://issues.apache.org/jira/browse/HBASE-4218 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BitsetKeyDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CompressionState.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CopyKeyDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncodedBlock.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderAlgorithms.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderToSmallBufferException.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DiffKeyDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/FastDiffDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockDeltaEncoder.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 1180113 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/EmptyBlockDeltaEncoder.java PRE-CREATION bq.
[jira] [Updated] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4488: - Attachment: 4488-add.txt Everybody OK with the addendum? Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488-add.txt, 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123368#comment-13123368 ] Lars Hofhansl commented on HBASE-4536: -- Yet another option is to take the smallest time of any of the store files and remove the family markers if they are older than that. The marker may survive two compactions in that case, but eventually they'll be removed. In addition to address Jon's need, I think we can add a raw flag to the Scan object. If true, the scan will retrieve all available rows including deleted rows and delete markers. With the rest of the changes from this patch, that would be really easy to do. (I assume I'd have to increment the SCAN_VERSION, correct?) Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4511) There is data loss when master failovers
[ https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123372#comment-13123372 ] gaojinchao commented on HBASE-4511: --- @RAM For this case. We can process it: 1. why the Region server can't exit ? 2. If master verifies the meta/root failed. Does master need crash? or wait for ServerShutDownHandler. There is data loss when master failovers Key: HBASE-4511 URL: https://issues.apache.org/jira/browse/HBASE-4511 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: gaojinchao Priority: Critical Fix For: 0.92.0 Attachments: org.apache.hadoop.hbase.master.TestMasterFailover-output.rar It goes like this: Master crashed , at the same time RS with meta is crashing, but RS doesn't eixt. Master startups again and finds all living RS. Master verifies the meta failed, because this RS is crashing. Master reassigns the meta, but it doesn't split the Hlog. So some meta data is loss. About the logs of a failover test case fail. //It said that we want to kill a RS 2011-09-28 19:54:45,694 INFO [Thread-988] regionserver.HRegionServer(1443): STOPPED: Killing for unit test 2011-09-28 19:54:45,694 INFO [Thread-988] master.TestMasterFailover(1007): RS 192.168.2.102,54385,1317264874629 killed //Rs didn't crash. 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720] master.HMaster(458): Registering server found up in zk: 192.168.2.102,54385,1317264874629 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720] master.ServerManager(232): Registering server=192.168.2.102,54385,1317264874629 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of znode /hbase/unassigned/1028785192 because node does not exist (not an error) 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... //Meta verification failed and ressigned the meta. So all the regions in the meta is loss. 2011-09-28 19:54:51,773 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... 2011-09-28 19:54:52,277 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... 2011-09-28 19:54:52,782 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or updating) unassigned node for 1028785192 with OFFLINE state 2011-09-28
[jira] [Commented] (HBASE-4511) There is data loss when master failovers
[ https://issues.apache.org/jira/browse/HBASE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123377#comment-13123377 ] ramkrishna.s.vasudevan commented on HBASE-4511: --- @Gao This problem occured in testcase. Can we reproduce this in real time? It would be great if we can reproduce so that we are clear of the actual problem? There is data loss when master failovers Key: HBASE-4511 URL: https://issues.apache.org/jira/browse/HBASE-4511 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: gaojinchao Priority: Critical Fix For: 0.92.0 Attachments: org.apache.hadoop.hbase.master.TestMasterFailover-output.rar It goes like this: Master crashed , at the same time RS with meta is crashing, but RS doesn't eixt. Master startups again and finds all living RS. Master verifies the meta failed, because this RS is crashing. Master reassigns the meta, but it doesn't split the Hlog. So some meta data is loss. About the logs of a failover test case fail. //It said that we want to kill a RS 2011-09-28 19:54:45,694 INFO [Thread-988] regionserver.HRegionServer(1443): STOPPED: Killing for unit test 2011-09-28 19:54:45,694 INFO [Thread-988] master.TestMasterFailover(1007): RS 192.168.2.102,54385,1317264874629 killed //Rs didn't crash. 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720] master.HMaster(458): Registering server found up in zk: 192.168.2.102,54385,1317264874629 2011-09-28 19:54:51,763 INFO [Master:0;192.168.2.102,54557,1317264885720] master.ServerManager(232): Registering server=192.168.2.102,54385,1317264874629 2011-09-28 19:54:51,770 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(491): master:54557-0x132b31adbb30005 Unable to get data of znode /hbase/unassigned/1028785192 because node does not exist (not an error) 2011-09-28 19:54:51,771 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... //Meta verification failed and ressigned the meta. So all the regions in the meta is loss. 2011-09-28 19:54:51,773 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:51,773 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,274 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... 2011-09-28 19:54:52,277 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:52,277 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,778 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKUtil(1003): master:54557-0x132b31adbb30005 Retrieved 33 byte(s) of data from znode /hbase/root-region-server and set watcher; 192.168.2.102,54383,131726487... 2011-09-28 19:54:52,782 INFO [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(476): Failed verification of .META.,,1 at address=192.168.2.102,54385,1317264874629; org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server 192.168.2.102,54385,1317264874629 not running, aborting 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] catalog.CatalogTracker(316): new .META. server: 192.168.2.102,54385,1317264874629 isn't valid. Cached .META. server: null 2011-09-28 19:54:52,782 DEBUG [Master:0;192.168.2.102,54557,1317264885720] zookeeper.ZKAssign(264): master:54557-0x132b31adbb30005 Creating (or updating) unassigned node for 1028785192 with OFFLINE state 2011-09-28
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123378#comment-13123378 ] Ted Yu commented on HBASE-4469: --- +1 on patch. Nice job. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123379#comment-13123379 ] Jerry Chen commented on HBASE-4488: --- I recall seeing some unit tests are written in the wrong while loop fashion as well. Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488-add.txt, 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123380#comment-13123380 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/ --- (Updated 2011-10-08 05:13:32.657832) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- This updated patch is same as uploaded at @ 07/Oct/11 14:27 Reverted the change of passing -2 for not comparing the version and address Ted's comment to add spaces. Summary --- Fix for handling HBASE-4539 and HBASE-4540. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. Also addresses Ted's comments. This addresses bug HBASE-4540. https://issues.apache.org/jira/browse/HBASE-4540 Diffs (updated) - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179945 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION Diff: https://reviews.apache.org/r/2251/diff Testing --- Yes Thanks, ramkrishna OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123381#comment-13123381 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2461 --- Ship it! - Ted On 2011-10-08 05:13:32, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-08 05:13:32) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179945 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123383#comment-13123383 ] Lars Hofhansl commented on HBASE-4488: -- I'll open another jira and fix all those. Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488-add.txt, 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-4488. -- Resolution: Fixed Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488-add.txt, 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123384#comment-13123384 ] Lars Hofhansl commented on HBASE-4488: -- created HBASE-4556 Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488-add.txt, 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4556) Fix all incorrect uses of InternalScanner.next(...)
Fix all incorrect uses of InternalScanner.next(...) --- Key: HBASE-4556 URL: https://issues.apache.org/jira/browse/HBASE-4556 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl There are cases all over the code where InternalScanner.next(...) is not used correctly. I see this a lot: {code} while(scanner.next(...)) { } {code} The correct pattern is: {code} boolean more = false; do { more = scanner.next(...); } while (more); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira