[jira] [Created] (HBASE-4798) Sleeps and synchronisation improvements for tests
Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Description: Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. was: Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Attachment: 4798_trunk_all.v2.patch let's see how it behaves on the integration servers Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v2.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4798: --- Status: Patch Available (was: Open) Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v2.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management
[ https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151107#comment-13151107 ] nkeywal commented on HBASE-4763: Note that:I found a new bug in surefire, and fixed it. It's SUREFIRE-791 (Unit47 provider reports incorrect elapsed time on test failure.). It should be on their trunk soon, so it makes sense to take it, as they fixed as well a regression of the 2.10, SUREFIRE-785, (Lots of newlines being strewn about in test output). Not mandatory, but nicer. Integrate surefire and junit for category management Key: HBASE-4763 URL: https://issues.apache.org/jira/browse/HBASE-4763 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: surefire_hbase.v2.patch As of today, Surefire integrates category on the trunk of 2.11 version: http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private patches as well. It may impact JUnit: https://github.com/KentBeck/junit/issues/359 This jira is about this integration. We will need a repo for this. For the naming of the versions to be created, I don't know if there is a convention. If not I would propose: 2.10-patched-HBASE Obviously, it's important to get our changes integrated in the main release: we're not forking surefire junit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1315#comment-1315 ] ramkrishna.s.vasudevan commented on HBASE-4796: --- Running test cases.. will submit patch once done. Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Stack and I came up with the solution that we need just manage that exception because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151126#comment-13151126 ] ramkrishna.s.vasudevan commented on HBASE-4729: --- I think this patch will not solve the problem. It will only allow the master from aborting. But the alter table will not be completed fully. There is one more thing here which i found in the same place In assignmentManager.unassign() {code} synchronized (regionsInTransition) { state = regionsInTransition.get(encodedName); {code} If we find a state already existing we just return with out unassigning. This also when races with Splitting of region will not allow alter table to be completed. Same will be the case if we just try to handle the NodeAlreadyExistsException. I would like to get your suggestion. Can we invoke manually to split? Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151131#comment-13151131 ] Hadoop QA commented on HBASE-4798: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503854/4798_trunk_all.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -163 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/263//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/263//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/263//console This message is automatically generated. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v2.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves
[ https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151143#comment-13151143 ] gaojinchao commented on HBASE-4654: --- Do we need throw exceptin in api addPeer? [replication] Add a check to make sure we don't replicate to ourselves -- Key: HBASE-4654 URL: https://issues.apache.org/jira/browse/HBASE-4654 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.90.5 Attachments: 4654-trunk.txt It's currently possible to add a peer for replication and point it to the local cluster, which I believe could very well happen for those like us that use only one ZK ensemble per DC so that only the root znode changes when you want to set up replication intra-DC. I don't think comparing just the cluster ID would be enough because you would normally use a different one for another cluster and nothing will block you from pointing elsewhere. Comparing the ZK ensemble address doesn't work either when you have multiple DNS entries that point at the same place. I think this could be resolved by looking up the master address in the relevant znode as it should be exactly the same thing in the case where you have the same cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151146#comment-13151146 ] nkeywal commented on HBASE-4798: @commiters: for catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation, the test is {noformat} ServerName nonsense = new ServerName(example.org, 1234, System.currentTimeMillis()); RootLocationEditor.setRootLocation(zookeeper, nonsense); // Bring back up the hbase cluster. See if it can deal with nonsense root // location. UTIL.startMiniHBaseCluster(1, 1); UTIL.shutdownMiniCluster(); {noformat} It appears that when the root location is invalid, the master cannot be initialized. The exception fails now because I modified the JVMClusterUtil#startup to make sure that the cluster is initialized when we leave the method (with a timeout avec 30s: that what's going on here). I prefer this, because it's a well known state (instead of sometimes initialized; sometimes not, depending on the sleeps that took place). It means that I would have to modify this test case to make it accept the exception. Are you ok? I also look after the other errors. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v2.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4739: -- Attachment: HBASE-4739_trial.patch trail version does not test and need improve Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151150#comment-13151150 ] Ted Yu commented on HBASE-4729: --- If a region is in transition, it may be splitting. Why do we need to manually split? I think we should add check for the state of zk node and decide whether to continue or not. Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4799) Catalog Janitor logic bug causes region leackage
Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Priority: Critical When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151197#comment-13151197 ] Hadoop QA commented on HBASE-4739: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503873/HBASE-4739_trial.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -163 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/264//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/264//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/264//console This message is automatically generated. Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151235#comment-13151235 ] ramkrishna.s.vasudevan commented on HBASE-4729: --- @Ted What i meant is the alter table does not happen even in this scenario where the region is in RIT with SPLITTING state. Correct me if am wrong Ted. Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests
[ https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151239#comment-13151239 ] nkeywal commented on HBASE-4798: @commiters: For coprocessor.TestRegionServerCoprocessorExceptionWithAbort, it's a little bit strange. It fails sometimes, but not always. When it succeeds, there are 3 retry, and it stops trying and the test is over. On failure, it tries up to 10 times before being interrupt in the middle of a try by the test timeout (because the waiting time between retries is greater then the timeout). Question: are the retries expected? If you increase the timeout enough, it's the 'put' which is marked as failed, but the region does not abort for the failure scenario. Sleeps and synchronisation improvements for tests - Key: HBASE-4798 URL: https://issues.apache.org/jira/browse/HBASE-4798 Project: HBase Issue Type: Improvement Components: master, regionserver, test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4798_trunk_all.v2.patch Multiple small changes: @commiters: Removing some sleeps made visible a bug on JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. You may want to review this. JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never met (test on !c !!c). Added a new synchronization point. AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if the notification is received before the wait. HMaster#loop: use a notification instead of a 1s sleep HRegionServer#waitForServerOnline: new method used by JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s sleep HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s ZooKeeperNodeTracker#start: replace a recursive call by a loop ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not stuck if the notification is received before the wait. HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with the change on HBaseTestingUtility we are 60s faster TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 1s TestRestartCluster#testClusterRestart: send all the table creation together, then check creation, should be faster TestHLog: shutdown the whole cluster instead of DFS only (more standard) JVMClusterUtil#startup: lower the sleep from 1s to 0,1s HConnectionManager#close: Zookeeper name in debug message from HConnectionManager after connection close was always null because it was set to null in the delete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151243#comment-13151243 ] Ted Yu commented on HBASE-4729: --- Wouldn't the daughter regions pick up new table definition from .tableinfo which is stored on hdfs ? Please take a look at TestInstantSchemaChange#testConcurrentInstantSchemaChangeAndSplit in https://reviews.apache.org/r/1786/diff/#index_header to see if that test covers our scenario. @J-D: Can you provide more details for your comment @ 16/Nov/11 00:30 ? Maybe you can address Ramkrishna's question. Race between online altering and splitting kills the master --- Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4799: - Assignee: Max Lapan Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151248#comment-13151248 ] Ted Yu commented on HBASE-4799: --- Max: Please use the following command to generate patch so that HadoopQA can run tests for it: {code} git format-patch --no-prefix HEAD^:HEAD {code} Please attach temporary fix patch first, formal patch second - HadoopQA would pick up the latest patch. After that, please click 'Submit Patch' to run tests on Jenkins. Thanks for the patch. If you experienced this problem in your cluster, please tell us whether the fix worked. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4768) Per-(table, columnFamily) metrics with configurable table name inclusion
[ https://issues.apache.org/jira/browse/HBASE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4768: -- Resolution: Fixed Status: Resolved (was: Patch Available) Test fixed by HBASE-4795 Per-(table, columnFamily) metrics with configurable table name inclusion Key: HBASE-4768 URL: https://issues.apache.org/jira/browse/HBASE-4768 Project: HBase Issue Type: New Feature Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: 4768.addendum, 4768.addendum2, D363.1.patch, D363.2.patch, D363.3.patch, D363.4.patch, D363.5.patch As we kept adding more granular block read and block cache usage statistics, a combinatorial explosion of various cases to monitor started to happen, especially when we wanted both per-table/column family/block type statistics and aggregate statistics on various subsets of these dimensions. Here, we un-clutters HFile readers, LruBlockCache, StoreFile, etc. by creating a centralized class that knows how to update all kinds of per-table/CF/block type counters. Table name and column family configuration have been pushed to a base class, SchemaConfigured. This is convenient as many of existing classes that have these properties (HFile readers/writers, HFile blocks, etc.) did not have a base class. Whether to collect per-(table, columnFamily) or per-columnFamily only metrics can be configured with the hbase.metrics.showTableName configuration key. We don't expect this configuration to change at runtime, so we cache the setting statically and log a warning when an attempt is made to flip it once already set. This way we don't have to pass configuration to a lot more places, e.g. everywhere an HFile reader is instantiated. Thanks to Liyin for his initial version of per-table metrics patch and a lot of valuable feedback. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3811) Allow adding attributes to Scan
[ https://issues.apache.org/jira/browse/HBASE-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3811: -- Fix Version/s: 0.92.0 Allow adding attributes to Scan --- Key: HBASE-3811 URL: https://issues.apache.org/jira/browse/HBASE-3811 Project: HBase Issue Type: Improvement Components: client Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Fix For: 0.92.0 Attachments: HBASE-3811.patch, HBASE-3811.patch, HBASE-3811.patch, HBASE-3811.patch There's sometimes a need to add custom attribute to Scan object so that it can be accessed on server side. Example of the case where it is needed discussed here: http://search-hadoop.com/m/v3Jtb2GkiO. There might be other cases where it is useful, which are mostly about logging/gathering stats on server side. Alternative to allowing adding any custom attributes to scan could be adding some fixed field, like type to the class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4793) HBase shell still using deprecated methods removed in HBASE-4436
[ https://issues.apache.org/jira/browse/HBASE-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4793: -- Priority: Critical (was: Blocker) Lowering priority before we find the next broken shell command. HBase shell still using deprecated methods removed in HBASE-4436 Key: HBASE-4793 URL: https://issues.apache.org/jira/browse/HBASE-4793 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.92.0, 0.94.0 Reporter: Gary Helmling Priority: Critical Fix For: 0.92.0 Attachments: 4793.txt The patch applied in HBASE-4622 (subtask of HBASE-4436) to remove deprecated methods seems to have missed some usage of those methods by the HBase shell. At least src/main/ruby/hbase/admin.rb is still using some of the removed methods, breaking some shell commands: {noformat} hbase(main):007:0 alter 'privatetable', { NAME = 'f1', VERSIONS = 2} ERROR: wrong number of arguments (3 for 2) Backtrace: /usr/lib/hbase/bin/../bin/../lib/ruby/hbase/admin.rb:344:in `alter' org/jruby/RubyArray.java:1572:in `each' /usr/lib/hbase/bin/../bin/../lib/ruby/hbase/admin.rb:317:in `alter' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands/alter.rb:79:in `command' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:68:in `format_simple_command' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands/alter.rb:78:in `command' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:31:in `command_safe' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:74:in `translate_hbase_exceptions' /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:31:in `command_safe' /usr/lib/hbase/bin/../bin/../lib/ruby/shell.rb:110:in `command' (eval):2:in `alter' {noformat} This trace translates to the line: {code} @admin.modifyColumn(table_name, column_name, descriptor) {code} which is calling one of the removed methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3465) Hbase should use a HADOOP_HOME environment variable if available.
[ https://issues.apache.org/jira/browse/HBASE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3465: -- Fix Version/s: 0.92.0 Hbase should use a HADOOP_HOME environment variable if available. - Key: HBASE-3465 URL: https://issues.apache.org/jira/browse/HBASE-3465 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Ted Dunning Assignee: Alejandro Abdelnur Fix For: 0.92.0 Attachments: a1-HBASE-3465.patch I have been burned a few times lately while developing code by having the make sure that the hadoop jar in hbase/lib is exactly correct. In my own deployment, there are actually 3 jars and a native library to keep in sync that hbase shouldn't have to know about explicitly. A similar problem arises when using stock hbase with CDH3 because of the security patches changing the wire protocol. All of these problems could be avoided by not assuming that the hadoop library is in the local directory. Moreover, I think it might be possible to assemble the distribution such that the compile time hadoop dependency is in a cognate directory to lib and is referenced using a default value for HADOOP_HOME. Does anybody have any violent antipathies to such a change? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4065) TableOutputFormat ignores failure to create table instance
[ https://issues.apache.org/jira/browse/HBASE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4065: -- Fix Version/s: 0.92.0 TableOutputFormat ignores failure to create table instance -- Key: HBASE-4065 URL: https://issues.apache.org/jira/browse/HBASE-4065 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Todd Lipcon Assignee: Brock Noland Fix For: 0.92.0, 0.94.0 Attachments: HBASE-4065.1.patch, HBASE-4065.2.patch If TableOutputFormat in the new API fails to create a table, it simply logs this at ERROR level and then continues on its way. Then, the first write() to the table will throw a NPE since table hasn't been set. Instead, it should probably rethrow the exception as a RuntimeException in setConf, or do what the old-API TOF does and not create the HTable instance until getRecordWriter, where it can throw an IOE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4225) NoSuchColumnFamilyException in multi doesn't say which family is bad
[ https://issues.apache.org/jira/browse/HBASE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4225: -- Fix Version/s: 0.92.0 NoSuchColumnFamilyException in multi doesn't say which family is bad Key: HBASE-4225 URL: https://issues.apache.org/jira/browse/HBASE-4225 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 4225.trunk, HBASE-4225_0.90.patch, HBASE-4225_0.90_1.patch, HBASE-4225_0.90_2.patch, HBASE-4225_0.90_3.patch It's kind of a dumb one, in HRegion.doMiniBatchPut we do: {code} LOG.warn(No such column family in batch put, nscf); batchOp.retCodes[lastIndexExclusive] = OperationStatusCode.BAD_FAMILY; {code} So we lose the family here, all we know is there's a bad one, that's what's in HRS.multi: {code} } else if (code == OperationStatusCode.BAD_FAMILY) { result = new NoSuchColumnFamilyException(); {code} We can't just throw the exception like that, we need to say which one is bad even if it requires testing all passed MultiActions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4253) Intermittent test failure because of missing config parameter in new HTable(tablename)
[ https://issues.apache.org/jira/browse/HBASE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4253: -- Fix Version/s: 0.92.0 Intermittent test failure because of missing config parameter in new HTable(tablename) -- Key: HBASE-4253 URL: https://issues.apache.org/jira/browse/HBASE-4253 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4253.patch, HBASE_4253_0.90.patch As per the description in HBASE-4138 this issue is raised to fix the random testcase failure. Consider the log in the failed build #2132 for the testcase TestScannerTimeOut 2011-08-23 04:30:11,195 INFO [main] zookeeper.MiniZooKeeperCluster(141): Failed binding ZK Server to client port: 21818 2011-08-23 04:30:11,226 INFO [main] zookeeper.MiniZooKeeperCluster(164): Started MiniZK Cluster and connect 1 ZK server on client port: 21819 By default we try connecting to 21818 but as it was not bindable we connect to 21819. (may be the port was busy). After starting the miniZkCluster this.conf.set(hbase.zookeeper.property.clientPort, Integer.toString(clientPort)); we set this port in the config object. So for RS and Master the zookeeper client port will be 21819. Now when the testcase starts running there is no testcase till TestScannerTimeout#test3686a where we need a new client connection. Now as part of test3686a we create new HTable() which calls {code} this(HBaseConfiguration.create(), tableName); {code} Here we create a new configuration object. Hence the zookeeper client port is taken to be 21818. Ideally due to improper shutdown of some prev zk cluster that was running in 21818 the test case was able to connect to this but the port being different it could not find the /hbase node. Hence the failure has happened. The remaining two testcases in TestHTablePool that failed also has the similar problem. Even the failure in build #2119 is exactly the same. There should be a mechanism from the test for the client code to know to which zk he should connect to. Another intersting thing All testcases are using new HTable(conf, tablename). Only these 3 test cases are using it like new HTable(tablename). Hence the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4297) TableMapReduceUtil overwrites user supplied options
[ https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4297: -- Fix Version/s: 0.90.5 TableMapReduceUtil overwrites user supplied options --- Key: HBASE-4297 URL: https://issues.apache.org/jira/browse/HBASE-4297 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Jan Lukavsky Fix For: 0.90.5 Attachments: HBASE-4297.patch Job configuration is overwritten by hbase-default and hbase-site in TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior in the following code: {noformat} Configuration conf = HBaseConfiguration.create(); // change keyvalue size conf.setInt(hbase.client.keyvalue.maxsize, 20971520); Job job = new Job(conf, ...); TableMapReduceUtil.initTableMapperJob(...); // the job doesn't have the option changed, uses it from hbase-site or hbase-default job.submit(); {noformat} Although in this case it could be fixed by moving the set() after initTableMapperJob(), in case where user wants to change some option using GenericOptionsParser and -D this is impossible, making this cool feature useless. In the 0.20.x era this code behaved as expected. The solution of this problem should be that we don't overwrite the options, but just read them if they are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves
[ https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151323#comment-13151323 ] Lars Hofhansl commented on HBASE-4654: -- I was thinking about that, would certainly be more user friendly. That means, though, that we already have to get the peerClusterId at that time, which in turns means that we have to make the connection to the peer cluster's ZK right away. On the other hand, I don't expect that to be a common scenario, just a safeguard against user error. [replication] Add a check to make sure we don't replicate to ourselves -- Key: HBASE-4654 URL: https://issues.apache.org/jira/browse/HBASE-4654 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.90.5 Attachments: 4654-trunk.txt It's currently possible to add a peer for replication and point it to the local cluster, which I believe could very well happen for those like us that use only one ZK ensemble per DC so that only the root znode changes when you want to set up replication intra-DC. I don't think comparing just the cluster ID would be enough because you would normally use a different one for another cluster and nothing will block you from pointing elsewhere. Comparing the ZK ensemble address doesn't work either when you have multiple DNS entries that point at the same place. I think this could be resolved by looking up the master address in the relevant znode as it should be exactly the same thing in the case where you have the same cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: (was: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch) Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: (was: 0002-Temporary-fix-to-remove-leaked-regions.patch) Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch 0002-Temporary-fix-to-remove-leaked-regions.patch Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: (was: 0002-Temporary-fix-to-remove-leaked-regions.patch) Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: (was: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch) Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: 0002-Temporary-fix-to-remove-leaked-regions.patch Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Attachment: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool
[ https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151346#comment-13151346 ] stack commented on HBASE-4611: -- Thank you Marek. Let me try again later today... Add support for Phabricator/Differential as an alternative code review tool --- Key: HBASE-4611 URL: https://issues.apache.org/jira/browse/HBASE-4611 Project: HBase Issue Type: Task Reporter: Jonathan Gray Assignee: Nicolas Spiegelberg Fix For: 0.92.0, 0.94.0 Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, D207.1.patch, D21.1.patch, D21.1.patch, HBASE-4611.D423.1.patch, HBASE-4611.D423.2.patch From http://phabricator.org/ : Phabricator is a open source collection of web applications which make it easier to write, review, and share source code. It is currently available as an early release. Phabricator was developed at Facebook. It's open source so pretty much anyone could host an instance of this software. To begin with, there will be a public-facing instance located at http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL http://osuosl.org). We will use this JIRA to deal with adding (and ensuring) Apache-friendly support that will allow us to do code reviews with Phabricator for HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Lapan updated HBASE-4799: - Status: Patch Available (was: Open) Resolves Janitor race conditions. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151350#comment-13151350 ] Jean-Daniel Cryans commented on HBASE-4799: --- At first glance, this sounds a whole lot like HBASE-4238. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151362#comment-13151362 ] Max Lapan commented on HBASE-4799: -- This bug differs from HBASE-4238. 4238 is about regions splitting too fast, but 4799 about regions splitting too slow. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151363#comment-13151363 ] Ted Yu commented on HBASE-4799: --- @Max: HadoopQA picks the latest patch. Looks like it is running through test suite: https://builds.apache.org/job/PreCommit-HBASE-Build/265/console Since your fix is marked against 0.90.4 and HBASE-4238 was integrated into 0.90.5 (absent from http://archive.cloudera.com/cdh/3/hbase-0.90.4+49.1.CHANGES.txt), I wonder if you could try Stack's fix out. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151366#comment-13151366 ] Ted Yu commented on HBASE-4799: --- @Max: I didn't see your comment @ 16/Nov/11 17:37 when I typed the response above. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151370#comment-13151370 ] Max Lapan commented on HBASE-4799: -- I tried, didn't help. 4238 is a different issue, appeared when regions splitted twice in a row. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4800) Result.compareResults is incorrect
Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4800) Result.compareResults is incorrect
[ https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4800: - Attachment: 4800.txt Simple patch with test Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl Attachments: 4800.txt A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4800) Result.compareResults is incorrect
[ https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4800: - Fix Version/s: 0.90.5 0.94.0 0.92.0 Assignee: Lars Hofhansl Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4800.txt A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4800) Result.compareResults is incorrect
[ https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151402#comment-13151402 ] stack commented on HBASE-4800: -- +1 Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4800.txt A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151411#comment-13151411 ] stack commented on HBASE-4797: -- Thinking some more on this, we don't need to rename recovered.edits files. The files are named for the first sequenceid in the file, so, we could just do file listing and sort the return. Then we'd have range of sequenceids per file. We could then just pass on files with edits that are smaller than regions current seqid. [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has
[ https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4797: - Priority: Critical (was: Major) Tags: noob Labels: noob (was: ) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has -- Key: HBASE-4797 URL: https://issues.apache.org/jira/browse/HBASE-4797 Project: HBase Issue Type: Bug Components: performance Reporter: stack Priority: Critical Labels: noob Testing 0.92, I crashed all servers out. Another bug makes it so WALs are not getting cleaned so I had 7000 regions to replay. The distributed split code did a nice job and cluster came back but interesting is that some hot regions ended up having loads of recovered.edits files -- tens if not hundreds -- to replay against the region (can we bulk load recovered.edits instead of replaying them?). Each recovered.edits file is taking about a second to process (though only about 30 odd edits per file it seems). The region is unavailable during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4800) Result.compareResults is incorrect
[ https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-4800. -- Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.90, 0.92, and trunk. Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4800.txt A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4796: -- Attachment: 4796.txt Patch from Ramkrishna. Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4796.txt I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Stack and I came up with the solution that we need just manage that exception because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4796: -- Status: Patch Available (was: Open) Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4796.txt I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Stack and I came up with the solution that we need just manage that exception because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Fix Version/s: (was: 0.92.0) 0.94.0 Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151433#comment-13151433 ] jirapos...@reviews.apache.org commented on HBASE-4213: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1786/ --- (Updated 2011-11-16 19:12:03.308667) Review request for Todd Lipcon, Andrew Purtell and Subbu Iyer. Changes --- Patch addresses Lars' comments Summary --- bq. From Subbu: here is the latest patch that support alter_instant, an instant schema change command that supports (Add, Modify, Delete column and Modify table) actions through ZK. 1. This pattern capitalizes on the fact that HRI's are now in HDFS and need not be sent over again from Master to RS cloud on every schema change event. 2. Offers real time instant schema change as we bypass the explicit bulk reassign (unassign + assign) of regions from master to RS. 3. Offers fault tolerant schema change support as schema changes now go through ZK. Secondary master taking over a failed schema change will be addressed through a separate JIRA. Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 1202381 /src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 1202523 /src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java 1202381 /src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 1202381 /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1202381 /src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 1202381 /src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java PRE-CREATION /src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 1202381 /src/main/resources/hbase-default.xml 1202381 /src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeFailover.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 1202381 /src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 1202381 Diff: https://reviews.apache.org/r/1786/diff Testing --- Unit tests pass. Thanks, Ted Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151434#comment-13151434 ] Hadoop QA commented on HBASE-4799: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503909/0001-Fix-of-Regions-Leaks-problem-in-janitor.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -163 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/265//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/265//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/265//console This message is automatically generated. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4761) Add Developer Debug Options to HBase Config
[ https://issues.apache.org/jira/browse/HBASE-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151440#comment-13151440 ] Kannan Muthukkaruppan commented on HBASE-4761: -- Going to sit down with Nicolas tomorrow and take care of this! Add Developer Debug Options to HBase Config --- Key: HBASE-4761 URL: https://issues.apache.org/jira/browse/HBASE-4761 Project: HBase Issue Type: Improvement Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Attachments: HBASE-4761.patch Add in optional HBase configuration options that core developers will commonly use: an option to enable JDWP debugging an option to use a separate logfile for GC information. (Part of the effort to move 89-fb features over to trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4199) blockCache summary - backend
[ https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4199: -- Fix Version/s: 0.92.0 blockCache summary - backend Key: HBASE-4199 URL: https://issues.apache.org/jira/browse/HBASE-4199 Project: HBase Issue Type: Sub-task Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Fix For: 0.92.0 Attachments: 4199.v5, java_HBASE_4199.patch, java_HBASE_4199_v2.patch, java_HBASE_4199_v3.patch, java_HBASE_4199_v4.patch This is the backend work for the blockCache summary. Change to BlockCache interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition to HRegionInterface, and HRegionServer. This will NOT include any of the web UI or anything else like that. That is for another sub-task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4628) Enhance Table Create Presplit Functionality within the HBase Shell
[ https://issues.apache.org/jira/browse/HBASE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151448#comment-13151448 ] Phabricator commented on HBASE-4628: Kannan has accepted the revision HBASE-4628 [jira] Enhance Table Create Presplit Functionality within the HBase Shell. Looks great! Sweet feature. Nicolas will walk me through the commit flow tomorrow. So should get committed tomorrow. REVISION DETAIL https://reviews.facebook.net/D417 Enhance Table Create Presplit Functionality within the HBase Shell -- Key: HBASE-4628 URL: https://issues.apache.org/jira/browse/HBASE-4628 Project: HBase Issue Type: Improvement Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Attachments: HBASE-4628.D411.1.patch, HBASE-4628.D417.1.patch, HBASE-4628.D417.2.patch, HBASE-4628.D429.1.patch Currently, we allow the user to presplit in the HBase shell by explicitly listing the startkey of all the region shards that they want. Instead, we should provide the RegionSplitter functionality of choosing a split algorithm, followed by the number of splits that they want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4668) List HDFS enhancements to speed up backups for HBase
[ https://issues.apache.org/jira/browse/HBASE-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan reassigned HBASE-4668: -- Assignee: Pritam Damania List HDFS enhancements to speed up backups for HBase Key: HBASE-4668 URL: https://issues.apache.org/jira/browse/HBASE-4668 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Pritam Damania There are a host of improvements that help: - HDFS fast copy - Various enhancements to fast copy to speed up things - File level hard links - which does ext3 hardlinks instead of copying blocks thereby saving a lot of iops Need to list out the HDFS jira's and have patches on them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4668) List HDFS enhancements to speed up backups for HBase
[ https://issues.apache.org/jira/browse/HBASE-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151449#comment-13151449 ] Karthik Ranganathan commented on HBASE-4668: @Andrew - totally, will let Pritam comment on that :) List HDFS enhancements to speed up backups for HBase Key: HBASE-4668 URL: https://issues.apache.org/jira/browse/HBASE-4668 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Pritam Damania There are a host of improvements that help: - HDFS fast copy - Various enhancements to fast copy to speed up things - File level hard links - which does ext3 hardlinks instead of copying blocks thereby saving a lot of iops Need to list out the HDFS jira's and have patches on them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151452#comment-13151452 ] Doug Meil commented on HBASE-4655: -- I'll gladly port this to the book, and I'd like to add this in here... http://hbase.apache.org/book.html#ops.backup ... with the existing backup info. Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151455#comment-13151455 ] Ted Yu commented on HBASE-4739: --- Trial patch makes sense. For unassign, the following change to javadoc is inaccurate: {code} * Updates the RegionState and creates a zk node. {code} We still send close RPC, right ? For sendRegionClose(): {code} * sends the CLOSE RPC to a region server. * @param region server to be unassigned {code} Should read region server which receives CLOSE RPC For createNodePendingClose(), please replace CLOSING with PEND_CLOSE (not PEND_CLOSING). Master dying while going to close a region can leave it in transition forever - Key: HBASE-4739 URL: https://issues.apache.org/jira/browse/HBASE-4739 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet. When the master restarted it saw the znode and started printing this: {quote} 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing {quote} It's never going to happen, and it's blocking balancing. I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151457#comment-13151457 ] Karthik Ranganathan commented on HBASE-4655: Sounds great Doug! Maybe we make a new section, keep adding stuff in, and deprecate the old stuff? Or whatever works... Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4280) [replication] ReplicationSink can deadlock itself via handlers
[ https://issues.apache.org/jira/browse/HBASE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4280: -- Fix Version/s: 0.94.0 0.92.0 [replication] ReplicationSink can deadlock itself via handlers -- Key: HBASE-4280 URL: https://issues.apache.org/jira/browse/HBASE-4280 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: HBASE-4280-0.90.patch I've experienced this problem a few times, ReplicationSink calls are received through the normal handlers and potentially can call itself which, in certain situations, call fill up all the handlers. For example, 10 handlers that are all replication calls are all trying to talk to the local server at the same time. HRS.replicateLogEntries should have @QosPriority(priority=HIGH_QOS) to use the other set of handlers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: 4213-0.92.v4 Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool
[ https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151485#comment-13151485 ] Nicolas Spiegelberg commented on HBASE-4611: @stack: note that the certificate installation is not a long process. It just has you visit a webpage while logged into FB give a displayed ID. Although annonymous access for contributors should be fine, I'd recommend secure access for all committers since they'll be approving diffs for inclusion. Add support for Phabricator/Differential as an alternative code review tool --- Key: HBASE-4611 URL: https://issues.apache.org/jira/browse/HBASE-4611 Project: HBase Issue Type: Task Reporter: Jonathan Gray Assignee: Nicolas Spiegelberg Fix For: 0.92.0, 0.94.0 Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, D207.1.patch, D21.1.patch, D21.1.patch, HBASE-4611.D423.1.patch, HBASE-4611.D423.2.patch From http://phabricator.org/ : Phabricator is a open source collection of web applications which make it easier to write, review, and share source code. It is currently available as an early release. Phabricator was developed at Facebook. It's open source so pretty much anyone could host an instance of this software. To begin with, there will be a public-facing instance located at http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL http://osuosl.org). We will use this JIRA to deal with adding (and ensuring) Apache-friendly support that will allow us to do code reviews with Phabricator for HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151489#comment-13151489 ] Hadoop QA commented on HBASE-4796: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503920/4796.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -163 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/266//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/266//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/266//console This message is automatically generated. Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4796.txt I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at
[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151492#comment-13151492 ] Ted Yu commented on HBASE-4796: --- +1 on patch. Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4796.txt I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Stack and I came up with the solution that we need just manage that exception because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151508#comment-13151508 ] Todd Lipcon commented on HBASE-4655: Two quick notes from looking over the doc: - the names are a little confusing to me - in-cluster back up is actually two clusters, right? I'd call your RBU an in-cluster backup, I'd call your CBU an in-datacenter backup, and I'd call your DBU a cross-datacenter backup, DR backup, or BCP backup. - For RBU, maybe we can get atomicity in a simpler manner by having the region server initiate the copy of hfiles? It can hold the lock to block flushes while the copies happen (they're hard-link copies, right?) Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151517#comment-13151517 ] stack commented on HBASE-4796: -- +1 on patch (this is what J-D and I discussed yesterday evening). One change I'd make though is remove of that buried retrun down inside of the catch do an if/else abort I can imagine someone reading this code and just not seeing the 'return'. Thanks Ram. Race between SplitRegionHandlers for the same region kills the master - Key: HBASE-4796 URL: https://issues.apache.org/jira/browse/HBASE-4796 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.94.0 Attachments: 4796.txt I just saw that multiple SplitRegionHandlers can be created for the same region because of the RS tickling, but it becomes deadly when more than 1 are trying to delete the znode at the same time: {quote} 2011-11-16 02:25:28,778 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,780 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, region=f80b6a904048a99ce88d61420b8906d1 2011-11-16 02:25:28,796 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,804 DEBUG org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT event for f80b6a904048a99ce88d61420b8906d1; deleting node 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Deleting existing unassigned node for f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x132f043bbde094b Successfully deleted unassigned node for region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT 2011-11-16 02:25:28,821 INFO org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT report); parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. daughter a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. 2011-11-16 02:25:28,829 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this is not a retry 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error deleting SPLIT node in ZK for transition ZK node (f80b6a904048a99ce88d61420b8906d1) org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) at org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) at org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} Stack and I came up with the solution that we need just manage that exception because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4801) alter_status shell prints sensible message at completion
alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4800) Result.compareResults is incorrect
[ https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151521#comment-13151521 ] Hudson commented on HBASE-4800: --- Integrated in HBase-TRUNK #2448 (See [https://builds.apache.org/job/HBase-TRUNK/2448/]) HBASE-4800 Result.compareResults is incorrect (James Taylor and Lars H) larsh : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestResult.java Result.compareResults is incorrect -- Key: HBASE-4800 URL: https://issues.apache.org/jira/browse/HBASE-4800 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0, 0.90.5 Attachments: 4800.txt A coworker of mine (James Taylor) found a bug in Result.compareResults(...). This condition: {code} if (!ourKVs[i].equals(replicatedKVs[i]) !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} should be {code} if (!ourKVs[i].equals(replicatedKVs[i]) || !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) { throw new Exception(This result was different: {code} Just checked, this is wrong in all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4801) alter_status shell prints sensible message at completion
[ https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151523#comment-13151523 ] Nicolas Spiegelberg commented on HBASE-4801: Part of 89-fb to 92 port. See r1182035 alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion
[ https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4801: --- Attachment: HBASE-4801.patch alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Attachments: HBASE-4801.patch The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4802) Disable show table metrics in bulk loader
Disable show table metrics in bulk loader - Key: HBASE-4802 URL: https://issues.apache.org/jira/browse/HBASE-4802 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 During bulk load, the Configuration object may be set to null. This caused an NPE in per-CF metrics because it consults the Configuration to determine whether to show the Table name. Need to add simple change to allow the conf to be null not specify table name in that instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4802) Disable show table metrics in bulk loader
[ https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151530#comment-13151530 ] Nicolas Spiegelberg commented on HBASE-4802: Part of 89-fb to 92 port. See r1182037 Disable show table metrics in bulk loader - Key: HBASE-4802 URL: https://issues.apache.org/jira/browse/HBASE-4802 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4802.patch During bulk load, the Configuration object may be set to null. This caused an NPE in per-CF metrics because it consults the Configuration to determine whether to show the Table name. Need to add simple change to allow the conf to be null not specify table name in that instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4802) Disable show table metrics in bulk loader
[ https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4802: --- Attachment: HBASE-4802.patch Disable show table metrics in bulk loader - Key: HBASE-4802 URL: https://issues.apache.org/jira/browse/HBASE-4802 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4802.patch During bulk load, the Configuration object may be set to null. This caused an NPE in per-CF metrics because it consults the Configuration to determine whether to show the Table name. Need to add simple change to allow the conf to be null not specify table name in that instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4802) Disable show table metrics in bulk loader
[ https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4802: --- Assignee: Liyin Tang (was: Nicolas Spiegelberg) Status: Patch Available (was: Open) Disable show table metrics in bulk loader - Key: HBASE-4802 URL: https://issues.apache.org/jira/browse/HBASE-4802 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Liyin Tang Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4802.patch During bulk load, the Configuration object may be set to null. This caused an NPE in per-CF metrics because it consults the Configuration to determine whether to show the Table name. Need to add simple change to allow the conf to be null not specify table name in that instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion
[ https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4801: --- Status: Patch Available (was: Open) patch created by one of our interns: Charles Gist. alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Attachments: HBASE-4801.patch The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4801) alter_status shell prints sensible message at completion
[ https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151531#comment-13151531 ] Nicolas Spiegelberg edited comment on HBASE-4801 at 11/16/11 9:29 PM: -- patch created by one of our interns: Christopher Gist. was (Author: nspiegelberg): patch created by one of our interns: Charles Gist. alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Attachments: HBASE-4801.patch The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4543) major_compact '.META.' has no effect
[ https://issues.apache.org/jira/browse/HBASE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151538#comment-13151538 ] Nicolas Spiegelberg commented on HBASE-4543: does this occur in trunk? can we get a patch for this? major_compact '.META.' has no effect Key: HBASE-4543 URL: https://issues.apache.org/jira/browse/HBASE-4543 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621, 0.89.20100924 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Attachments: 0001-fix-the-issue-of-.META.-getting-ignored.patch major_compact '.META.' has no effect, although major_compact 'any_other_table' works fine from the shell. This issue seems to only affect 0.89. The apache-trunk seems to handle this case properly. The issue is that getTableRegions() in HMaster.java only works if the tableName given is a normal table. The methodology (using a MetaScanner to look through the .META. table for the tableName) does not work if the tableName is .META. The fix modifies getTableRegions() to check if the tableName is .META.; and if so, handle it accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151545#comment-13151545 ] stack commented on HBASE-4799: -- Thanks for digging in here Max. This comment is now wrong? {code} + // Remove daughters from the parent IFF the daughter region exists in FS. + // If there is no daughter region in the filesystem, must be because of + // a failed split. The ServerShutdownHandler will do the fixup. Don't + // do any deletes in here that could intefere with ServerShutdownHandler + // fixup {code} hasNoReferences will return if no daughter dir or if no references (so if no daughter dir we'll delete parent). Otherwise, I'm fine w/ tying together the removals of splitA and splitB... all in the one go; it dumbs down the number of possible states which is usually a good thing. One thing though, rather than removeDaughterFromParent, shouldn't we do the clear of both splitA and splitB in the one go since its same row (we could have strange case where splitA was removed but then we crash before splitB was removed). Don't we need a removeDaughter*s*FromParent; i.e. plural? Good stuff Max. Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage
[ https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4799: - Fix Version/s: 0.90.5 0.92.0 Catalog Janitor logic bug causes region leackage Key: HBASE-4799 URL: https://issues.apache.org/jira/browse/HBASE-4799 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Max Lapan Assignee: Max Lapan Priority: Critical Fix For: 0.92.0, 0.90.5 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 0002-Temporary-fix-to-remove-leaked-regions.patch When region split takes a significant amount of time, CatalogJanitor can cleanup one of SPLIT records, but left another in META. When another split finish, janitor cleans left SPLIT record, but parent regions haven't removed from FS and META not cleared. The race condition is follows: 1. region split started 2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) doesn't 3. janitor started and in routine checkDaughter removes SPLITA from meta, but see that SPLITB has references and does nothing. 4. region B completes split 5. janitor wakes up, removes SPLITB, but see that there is no records for A and does nothing again. Result - parent region hangs forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151554#comment-13151554 ] Hadoop QA commented on HBASE-4213: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503931/4213-0.92.v4 against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 58 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/267//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/267//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/267//console This message is automatically generated. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151564#comment-13151564 ] Karthik Ranganathan commented on HBASE-4655: For #1, totally :) internally, we use the term cluster to denote a section of the data center (as opposed to the HBase cluster), a data center is composed of a number of clusters, hence the name. in-DC and cross-DC work. For #2, this makes the running cluster stall and not take updates for the time period of the copy. It is fast-copy with hard-links underneath, but there is nothing in the current design that would stop it from being used against a remote cluster or a DFS version without the hard-link. Also, if for some reason the hard link fails, it does a deep copy, so it could have longer stalls. Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management
[ https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151566#comment-13151566 ] nkeywal commented on HBASE-4763: Surefire revision for SUREFIRE-791 SUREFIRE-785: r1202059 Integrate surefire and junit for category management Key: HBASE-4763 URL: https://issues.apache.org/jira/browse/HBASE-4763 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: surefire_hbase.v2.patch As of today, Surefire integrates category on the trunk of 2.11 version: http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private patches as well. It may impact JUnit: https://github.com/KentBeck/junit/issues/359 This jira is about this integration. We will need a repo for this. For the naming of the versions to be created, I don't know if there is a convention. If not I would propose: 2.10-patched-HBASE Obviously, it's important to get our changes integrated in the main release: we're not forking surefire junit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4803) Split log worker should terminate properly when waiting for znode
Split log worker should terminate properly when waiting for znode - Key: HBASE-4803 URL: https://issues.apache.org/jira/browse/HBASE-4803 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Prakash Khemani Priority: Minor Fix For: 0.94.0 This is an attempt to fix the fact that SplitLogWorker threads were not being terminated properly in some multi-master unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4803) Split log worker should terminate properly when waiting for znode
[ https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151569#comment-13151569 ] Nicolas Spiegelberg commented on HBASE-4803: Part of 89-fb to 92 port. See r1188420 Split log worker should terminate properly when waiting for znode - Key: HBASE-4803 URL: https://issues.apache.org/jira/browse/HBASE-4803 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Prakash Khemani Priority: Minor Fix For: 0.94.0 This is an attempt to fix the fact that SplitLogWorker threads were not being terminated properly in some multi-master unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4803) Split log worker should terminate properly when waiting for znode
[ https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4803: --- Status: Patch Available (was: Open) Split log worker should terminate properly when waiting for znode - Key: HBASE-4803 URL: https://issues.apache.org/jira/browse/HBASE-4803 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Prakash Khemani Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4803.patch This is an attempt to fix the fact that SplitLogWorker threads were not being terminated properly in some multi-master unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4803) Split log worker should terminate properly when waiting for znode
[ https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HBASE-4803: --- Attachment: HBASE-4803.patch Split log worker should terminate properly when waiting for znode - Key: HBASE-4803 URL: https://issues.apache.org/jira/browse/HBASE-4803 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Prakash Khemani Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4803.patch This is an attempt to fix the fact that SplitLogWorker threads were not being terminated properly in some multi-master unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4543) major_compact '.META.' has no effect
[ https://issues.apache.org/jira/browse/HBASE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151574#comment-13151574 ] stack commented on HBASE-4543: -- Its working in trunk. I see these messages when I tried it just now: {code} 2011-11-16 22:20:53,676 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Small Compaction requested: regionName=.META.,,1.1028785192, storeName=info, fileCount=1, fileSize=582.8k (582.8k), priority=1, time=9000612904273038; Because: User-triggered major compaction; compaction_queue=(0:0), split_queue=0 {code} (Notice the 'User-triggered...') major_compact '.META.' has no effect Key: HBASE-4543 URL: https://issues.apache.org/jira/browse/HBASE-4543 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100621, 0.89.20100924 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Attachments: 0001-fix-the-issue-of-.META.-getting-ignored.patch major_compact '.META.' has no effect, although major_compact 'any_other_table' works fine from the shell. This issue seems to only affect 0.89. The apache-trunk seems to handle this case properly. The issue is that getTableRegions() in HMaster.java only works if the tableName given is a normal table. The methodology (using a MetaScanner to look through the .META. table for the tableName) does not work if the tableName is .META. The fix modifies getTableRegions() to check if the tableName is .META.; and if so, handle it accordingly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management
[ https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151577#comment-13151577 ] stack commented on HBASE-4763: -- You want to give us plugins to host N? Seems like Gary is volunteering hosting . Integrate surefire and junit for category management Key: HBASE-4763 URL: https://issues.apache.org/jira/browse/HBASE-4763 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: surefire_hbase.v2.patch As of today, Surefire integrates category on the trunk of 2.11 version: http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private patches as well. It may impact JUnit: https://github.com/KentBeck/junit/issues/359 This jira is about this integration. We will need a repo for this. For the naming of the versions to be created, I don't know if there is a convention. If not I would propose: 2.10-patched-HBASE Obviously, it's important to get our changes integrated in the main release: we're not forking surefire junit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Attachment: 4213-trunk.txt Latest patch adds test category. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Status: Open (was: Patch Available) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4213: -- Status: Patch Available (was: Open) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151581#comment-13151581 ] Ted Yu commented on HBASE-4213: --- TestDistributedLogSplitting#testWorkerAbort passed on my MacBook. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151583#comment-13151583 ] Ted Yu commented on HBASE-4213: --- HBASE-4370 has mostly been covered by latest patch. Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK. Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.94.0 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 4213-101211-Support_instant_schema_changes_through_ZK.patch, 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 4213-V10-Support_instant_schema_changes_through_ZK.patch, 4213-V5-Support_instant_schema_changes_through_ZK.patch, 4213-V7-Support_instant_schema_changes_through_ZK.patch, 4213-V8-Support_instant_schema_changes_through_ZK.patch, 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, HBASE-4213_Instant_schema_change_-Version_2_.patch, HBASE_Instant_schema_change-version_3_.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management
[ https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151586#comment-13151586 ] nkeywal commented on HBASE-4763: Yes, I am working with him, he's building surefire junit today. Integrate surefire and junit for category management Key: HBASE-4763 URL: https://issues.apache.org/jira/browse/HBASE-4763 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: surefire_hbase.v2.patch As of today, Surefire integrates category on the trunk of 2.11 version: http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private patches as well. It may impact JUnit: https://github.com/KentBeck/junit/issues/359 This jira is about this integration. We will need a repo for this. For the naming of the versions to be created, I don't know if there is a convention. If not I would propose: 2.10-patched-HBASE Obviously, it's important to get our changes integrated in the main release: we're not forking surefire junit! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4655) Document architecture of backups
[ https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151589#comment-13151589 ] stack commented on HBASE-4655: -- Echo Todd #1 remarks. For '...incremental backups at the Stage 1 (RBU) level', won't the time between step between b and d be 'large' and during the copy time, the list of files could change on you; i.e. when you go to copy a file, it maybe have been removed because it'd been compacted. What do you do in this case? (Your list may not included the compacted file)? For a.The backups rely on the clocks across the various region-servers for determining the point in time to which the edits are re-played, so, say a server is lagging the others by a good bit? When replaying the edits, you'd replay edits from when this lagging server said the backup began? How will you know which hlogs to replay? You'll open it and look at first and last edits in the file? Or should we write out metadata files for hlogs? Or is it enough relying on hdfs modtime? Looks great K. Document architecture of backups Key: HBASE-4655 URL: https://issues.apache.org/jira/browse/HBASE-4655 Project: HBase Issue Type: Sub-task Components: documentation, regionserver Reporter: Karthik Ranganathan Assignee: Karthik Ranganathan Attachments: HBase Backups Architecture.docx Basic idea behind the backup architecture for HBase -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion
[ https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4801: - Resolution: Fixed Fix Version/s: 0.92.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed branch and trunk. Thanks Nicolas and Christopher. alter_status shell prints sensible message at completion Key: HBASE-4801 URL: https://issues.apache.org/jira/browse/HBASE-4801 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Trivial Fix For: 0.92.0 Attachments: HBASE-4801.patch The alter_status command used to print 0/0 once an alter operation had completed and its progress was no longer available. Now it instad indicates that all regions were updated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4802) Disable show table metrics in bulk loader
[ https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151592#comment-13151592 ] stack commented on HBASE-4802: -- Is this right Nicolas? You add a null check but at same time are removing a null check. Disable show table metrics in bulk loader - Key: HBASE-4802 URL: https://issues.apache.org/jira/browse/HBASE-4802 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Liyin Tang Priority: Trivial Fix For: 0.94.0 Attachments: HBASE-4802.patch During bulk load, the Configuration object may be set to null. This caused an NPE in per-CF metrics because it consults the Configuration to determine whether to show the Table name. Need to add simple change to allow the conf to be null not specify table name in that instance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira