[jira] [Created] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Created) (JIRA)
Sleeps and synchronisation improvements for tests
-

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


Multiple small changes:

@commiters: Removing some sleeps made visible a bug on 
JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
You may want to review this.

JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
met (test on !c  !!c). Added a new synchronization point.
AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if 
the notification is received before the wait.
HMaster#loop: use a notification instead of a 1s sleep
HRegionServer#waitForServerOnline: new method used by 
JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
sleep
HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
ZooKeeperNodeTracker#start: replace a recursive call by a loop
ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
stuck if the notification is received before the wait.
HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with 
the change on HBaseTestingUtility we are 60s faster
TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 
1s
TestRestartCluster#testClusterRestart: send all the table creation together, 
then check creation, should be faster
TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
HConnectionManager#close: Zookeeper name in debug message from 
HConnectionManager after connection close was always null because it was set to 
null in the delete.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Description: 
Multiple small changes:

@commiters: Removing some sleeps made visible a bug on 
JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
You may want to review this.

JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
met (test on !c  !!c). Added a new synchronization point.

AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if 
the notification is received before the wait.

HMaster#loop: use a notification instead of a 1s sleep

HRegionServer#waitForServerOnline: new method used by 
JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification

HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
sleep

HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s

ZooKeeperNodeTracker#start: replace a recursive call by a loop

ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
stuck if the notification is received before the wait.

HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s

TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with 
the change on HBaseTestingUtility we are 60s faster

TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 
1s

TestRestartCluster#testClusterRestart: send all the table creation together, 
then check creation, should be faster

TestHLog: shutdown the whole cluster instead of DFS only (more standard) 

JVMClusterUtil#startup: lower the sleep from 1s to 0,1s

HConnectionManager#close: Zookeeper name in debug message from 
HConnectionManager after connection close was always null because it was set to 
null in the delete.


  was:
Multiple small changes:

@commiters: Removing some sleeps made visible a bug on 
JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
You may want to review this.

JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
met (test on !c  !!c). Added a new synchronization point.
AssignementManager#waitForAssignment: add a timeout on the wait = not stuck if 
the notification is received before the wait.
HMaster#loop: use a notification instead of a 1s sleep
HRegionServer#waitForServerOnline: new method used by 
JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
sleep
HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
ZooKeeperNodeTracker#start: replace a recursive call by a loop
ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
stuck if the notification is received before the wait.
HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, with 
the change on HBaseTestingUtility we are 60s faster
TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead of 
1s
TestRestartCluster#testClusterRestart: send all the table creation together, 
then check creation, should be faster
TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
HConnectionManager#close: Zookeeper name in debug message from 
HConnectionManager after connection close was always null because it was set to 
null in the delete.



 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor

 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout 

[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Attachment: 4798_trunk_all.v2.patch

let's see how it behaves on the integration servers

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v2.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Patch Available  (was: Open)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v2.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management

2011-11-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151107#comment-13151107
 ] 

nkeywal commented on HBASE-4763:


Note that:I found a new bug in surefire, and fixed it. It's SUREFIRE-791 
(Unit47 provider reports incorrect elapsed time on test failure.). It should be 
on their trunk soon, so it makes sense to take it, as they fixed as well a 
regression of the 2.10, SUREFIRE-785, (Lots of newlines being strewn about in 
test output). Not mandatory, but nicer.

 Integrate surefire and junit for category management
 

 Key: HBASE-4763
 URL: https://issues.apache.org/jira/browse/HBASE-4763
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: surefire_hbase.v2.patch


 As of today, Surefire integrates category on the trunk of 2.11 version: 
 http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private 
 patches as well.
 It may impact JUnit: https://github.com/KentBeck/junit/issues/359
 This jira is about this integration. We will need a repo for this.
 For the naming of the versions to be created, I don't know if there is a 
 convention. If not I would propose: 2.10-patched-HBASE
  
 Obviously, it's important to get our changes integrated in the main release: 
 we're not forking surefire  junit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1315#comment-1315
 ] 

ramkrishna.s.vasudevan commented on HBASE-4796:
---

Running test cases.. will submit patch once done.

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151126#comment-13151126
 ] 

ramkrishna.s.vasudevan commented on HBASE-4729:
---

I think this patch will not solve the problem.  It will only allow the master 
from aborting.
But the alter table will not be completed fully.

There is one more thing here which i found in the same place
In assignmentManager.unassign() 
{code}
synchronized (regionsInTransition) {
  state = regionsInTransition.get(encodedName);
{code}
If we find a state already existing we just return with out unassigning.
This also when races with Splitting of region will not allow alter table to be 
completed.
Same will be the case if we just try to handle the NodeAlreadyExistsException.  
I would like to get your suggestion.  Can we invoke manually to split?

 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151131#comment-13151131
 ] 

Hadoop QA commented on HBASE-4798:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503854/4798_trunk_all.v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -163 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 51 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
  org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/263//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/263//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/263//console

This message is automatically generated.

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v2.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves

2011-11-16 Thread gaojinchao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151143#comment-13151143
 ] 

gaojinchao commented on HBASE-4654:
---

Do we need throw exceptin in api addPeer? 

 [replication] Add a check to make sure we don't replicate to ourselves
 --

 Key: HBASE-4654
 URL: https://issues.apache.org/jira/browse/HBASE-4654
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.5

 Attachments: 4654-trunk.txt


 It's currently possible to add a peer for replication and point it to the 
 local cluster, which I believe could very well happen for those like us that 
 use only one ZK ensemble per DC so that only the root znode changes when you 
 want to set up replication intra-DC.
 I don't think comparing just the cluster ID would be enough because you would 
 normally use a different one for another cluster and nothing will block you 
 from pointing elsewhere.
 Comparing the ZK ensemble address doesn't work either when you have multiple 
 DNS entries that point at the same place.
 I think this could be resolved by looking up the master address in the 
 relevant znode as it should be exactly the same thing in the case where you 
 have the same cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151146#comment-13151146
 ] 

nkeywal commented on HBASE-4798:


@commiters: for 
catalog.TestCatalogTrackerOnCluster.testBadOriginalRootLocation, the test is
{noformat}
ServerName nonsense =
  new ServerName(example.org, 1234, System.currentTimeMillis());
RootLocationEditor.setRootLocation(zookeeper, nonsense);
// Bring back up the hbase cluster.  See if it can deal with nonsense root
// location.
UTIL.startMiniHBaseCluster(1, 1);
UTIL.shutdownMiniCluster();
{noformat}

It appears that when the root location is invalid, the master cannot be 
initialized. The exception fails now because I modified the 
JVMClusterUtil#startup to make sure that the cluster is initialized when we 
leave the method (with a timeout avec 30s: that what's going on here). I prefer 
this, because it's a well known state (instead of sometimes initialized; 
sometimes not, depending on the sleeps that took place). It means that I would 
have to modify this test case to make it accept the exception. Are you ok?

I also look after the other errors.


 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v2.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-16 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4739:
--

Attachment: HBASE-4739_trial.patch

trail version does not test and need improve

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151150#comment-13151150
 ] 

Ted Yu commented on HBASE-4729:
---

If a region is in transition, it may be splitting. 
Why do we need to manually split?
I think we should add check for the state of zk node and decide whether to 
continue or not. 

 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Created) (JIRA)
Catalog Janitor logic bug causes region leackage


 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Priority: Critical


When region split takes a significant amount of time, CatalogJanitor can 
cleanup one of SPLIT records, but left another in META. When another split 
finish, janitor cleans left SPLIT record, but parent regions haven't removed 
from FS and META not cleared.

The race condition is follows:
1. region split started
2. one of regions splitted, i.e. A (have no reference storefiles) but other (B) 
doesn't
3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
see that SPLITB has references and does nothing.
4. region B completes split
5. janitor wakes up, removes SPLITB, but see that there is no records for A and 
does nothing again.

Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151197#comment-13151197
 ] 

Hadoop QA commented on HBASE-4739:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503873/HBASE-4739_trial.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -163 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 51 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/264//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/264//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/264//console

This message is automatically generated.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-16 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151235#comment-13151235
 ] 

ramkrishna.s.vasudevan commented on HBASE-4729:
---

@Ted
What i meant is the alter table does not happen even in this scenario where the 
region is in RIT with SPLITTING state. Correct me if am wrong Ted.


 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151239#comment-13151239
 ] 

nkeywal commented on HBASE-4798:


@commiters: For coprocessor.TestRegionServerCoprocessorExceptionWithAbort, it's 
a little bit strange. It fails sometimes, but not always. When it succeeds, 
there are 3 retry, and it stops trying and the test is over. On failure, it 
tries up to 10 times before being interrupt in the middle of a try by the test 
timeout (because the waiting time between retries is greater then the timeout). 
Question: are the retries expected?
If you increase the timeout enough, it's the 'put' which is marked as failed, 
but the region does not abort for the failure scenario.


 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v2.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151243#comment-13151243
 ] 

Ted Yu commented on HBASE-4729:
---

Wouldn't the daughter regions pick up new table definition from .tableinfo 
which is stored on hdfs ?

Please take a look at 
TestInstantSchemaChange#testConcurrentInstantSchemaChangeAndSplit in 
https://reviews.apache.org/r/1786/diff/#index_header to see if that test covers 
our scenario.

@J-D:
Can you provide more details for your comment @ 16/Nov/11 00:30 ?
Maybe you can address Ramkrishna's question.

 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4799:
-

Assignee: Max Lapan

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151248#comment-13151248
 ] 

Ted Yu commented on HBASE-4799:
---

Max:
Please use the following command to generate patch so that HadoopQA can run 
tests for it:
{code}
git format-patch --no-prefix HEAD^:HEAD
{code}
Please attach temporary fix patch first, formal patch second - HadoopQA would 
pick up the latest patch.

After that, please click 'Submit Patch' to run tests on Jenkins.

Thanks for the patch.

If you experienced this problem in your cluster, please tell us whether the fix 
worked.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4768) Per-(table, columnFamily) metrics with configurable table name inclusion

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4768:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Test fixed by HBASE-4795

 Per-(table, columnFamily) metrics with configurable table name inclusion
 

 Key: HBASE-4768
 URL: https://issues.apache.org/jira/browse/HBASE-4768
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.94.0

 Attachments: 4768.addendum, 4768.addendum2, D363.1.patch, 
 D363.2.patch, D363.3.patch, D363.4.patch, D363.5.patch


 As we kept adding more granular block read and block cache usage statistics, 
 a combinatorial explosion of various cases to monitor started to happen, 
 especially when we wanted both per-table/column family/block type statistics 
 and aggregate statistics on various subsets of these dimensions. Here, we 
 un-clutters HFile readers, LruBlockCache, StoreFile, etc. by creating a 
 centralized class that knows how to update all kinds of per-table/CF/block 
 type counters. 
 Table name and column family configuration have been pushed to a base class, 
 SchemaConfigured. This is convenient as many of existing classes that have 
 these properties (HFile readers/writers, HFile blocks, etc.) did not have a 
 base class. Whether to collect per-(table, columnFamily) or per-columnFamily 
 only metrics can be configured with the hbase.metrics.showTableName 
 configuration key. We don't expect this configuration to change at runtime, 
 so we cache the setting statically and log a warning when an attempt is made 
 to flip it once already set. This way we don't have to pass configuration to 
 a lot more places, e.g. everywhere an HFile reader is instantiated.
 Thanks to Liyin for his initial version of per-table metrics patch and a lot 
 of valuable feedback.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3811) Allow adding attributes to Scan

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3811:
--

Fix Version/s: 0.92.0

 Allow adding attributes to Scan
 ---

 Key: HBASE-3811
 URL: https://issues.apache.org/jira/browse/HBASE-3811
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3811.patch, HBASE-3811.patch, HBASE-3811.patch, 
 HBASE-3811.patch


 There's sometimes a need to add custom attribute to Scan object so that it 
 can be accessed on server side.
 Example of the case where it is needed discussed here: 
 http://search-hadoop.com/m/v3Jtb2GkiO. There might be other cases where it is 
 useful, which are mostly about logging/gathering stats on server side.
 Alternative to allowing adding any custom attributes to scan could be adding 
 some fixed field, like type to the class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4793) HBase shell still using deprecated methods removed in HBASE-4436

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4793:
--

Priority: Critical  (was: Blocker)

Lowering priority before we find the next broken shell command.

 HBase shell still using deprecated methods removed in HBASE-4436
 

 Key: HBASE-4793
 URL: https://issues.apache.org/jira/browse/HBASE-4793
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4793.txt


 The patch applied in HBASE-4622 (subtask of HBASE-4436) to remove deprecated 
 methods seems to have missed some usage of those methods by the HBase shell.  
 At least src/main/ruby/hbase/admin.rb is still using some of the removed 
 methods, breaking some shell commands:
 {noformat}
 hbase(main):007:0 alter 'privatetable', { NAME = 'f1', VERSIONS = 2}
 ERROR: wrong number of arguments (3 for 2)
 Backtrace: /usr/lib/hbase/bin/../bin/../lib/ruby/hbase/admin.rb:344:in `alter'
org/jruby/RubyArray.java:1572:in `each'
/usr/lib/hbase/bin/../bin/../lib/ruby/hbase/admin.rb:317:in `alter'

 /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands/alter.rb:79:in `command'
/usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:68:in 
 `format_simple_command'

 /usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands/alter.rb:78:in `command'
/usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:31:in 
 `command_safe'
/usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:74:in 
 `translate_hbase_exceptions'
/usr/lib/hbase/bin/../bin/../lib/ruby/shell/commands.rb:31:in 
 `command_safe'
/usr/lib/hbase/bin/../bin/../lib/ruby/shell.rb:110:in `command'
(eval):2:in `alter'
 {noformat}
 This trace translates to the line:
 {code}
   @admin.modifyColumn(table_name, column_name, descriptor)
 {code}
 which is calling one of the removed methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3465) Hbase should use a HADOOP_HOME environment variable if available.

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3465:
--

Fix Version/s: 0.92.0

 Hbase should use a HADOOP_HOME environment variable if available.
 -

 Key: HBASE-3465
 URL: https://issues.apache.org/jira/browse/HBASE-3465
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Ted Dunning
Assignee: Alejandro Abdelnur
 Fix For: 0.92.0

 Attachments: a1-HBASE-3465.patch


 I have been burned a few times lately while developing code by having the 
 make sure that the hadoop jar in hbase/lib is exactly correct.  In my own 
 deployment, there are actually 3 jars and a native library to keep in sync 
 that hbase shouldn't have to know about explicitly.  A similar problem arises 
 when using stock hbase with CDH3 because of the security patches changing the 
 wire protocol.
 All of these problems could be avoided by not assuming that the hadoop 
 library is in the local directory.  Moreover, I think it might be possible to 
 assemble the distribution such that the compile time hadoop dependency is in 
 a cognate directory to lib and is referenced using a default value for 
 HADOOP_HOME.
 Does anybody have any violent antipathies to such a change?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4065) TableOutputFormat ignores failure to create table instance

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4065:
--

Fix Version/s: 0.92.0

 TableOutputFormat ignores failure to create table instance
 --

 Key: HBASE-4065
 URL: https://issues.apache.org/jira/browse/HBASE-4065
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Todd Lipcon
Assignee: Brock Noland
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4065.1.patch, HBASE-4065.2.patch


 If TableOutputFormat in the new API fails to create a table, it simply logs 
 this at ERROR level and then continues on its way. Then, the first write() to 
 the table will throw a NPE since table hasn't been set.
 Instead, it should probably rethrow the exception as a RuntimeException in 
 setConf, or do what the old-API TOF does and not create the HTable instance 
 until getRecordWriter, where it can throw an IOE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4225) NoSuchColumnFamilyException in multi doesn't say which family is bad

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4225:
--

Fix Version/s: 0.92.0

 NoSuchColumnFamilyException in multi doesn't say which family is bad
 

 Key: HBASE-4225
 URL: https://issues.apache.org/jira/browse/HBASE-4225
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 4225.trunk, HBASE-4225_0.90.patch, 
 HBASE-4225_0.90_1.patch, HBASE-4225_0.90_2.patch, HBASE-4225_0.90_3.patch


 It's kind of a dumb one, in HRegion.doMiniBatchPut we do:
 {code}
 LOG.warn(No such column family in batch put, nscf);
 batchOp.retCodes[lastIndexExclusive] = OperationStatusCode.BAD_FAMILY;
 {code}
 So we lose the family here, all we know is there's a bad one, that's what's 
 in HRS.multi:
 {code}
 } else if (code == OperationStatusCode.BAD_FAMILY) {
   result = new NoSuchColumnFamilyException();
 {code}
 We can't just throw the exception like that, we need to say which one is bad 
 even if it requires testing all passed MultiActions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4253) Intermittent test failure because of missing config parameter in new HTable(tablename)

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4253:
--

Fix Version/s: 0.92.0

 Intermittent test failure because of missing config parameter in new 
 HTable(tablename)
 --

 Key: HBASE-4253
 URL: https://issues.apache.org/jira/browse/HBASE-4253
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4253.patch, HBASE_4253_0.90.patch


 As per the description in HBASE-4138 this issue is raised to fix the random 
 testcase failure.
 Consider the log in the failed build #2132 for the testcase TestScannerTimeOut
 2011-08-23 04:30:11,195 INFO  [main] zookeeper.MiniZooKeeperCluster(141): 
 Failed binding ZK Server to client port: 21818
 2011-08-23 04:30:11,226 INFO  [main] zookeeper.MiniZooKeeperCluster(164): 
 Started MiniZK Cluster and connect 1 ZK server on client port: 21819
 By default we try connecting to 21818 but as it was not bindable we connect 
 to 21819. (may be the port was busy).
 After starting the miniZkCluster
 this.conf.set(hbase.zookeeper.property.clientPort,
   Integer.toString(clientPort));
 we set this port in the config object.
 So for RS and Master the zookeeper client port will be 21819.
 Now when the testcase starts running there is no testcase till 
 TestScannerTimeout#test3686a where we need a new client connection.
 Now as part of test3686a we create new HTable() which calls
 {code}
 this(HBaseConfiguration.create(), tableName);
 {code}
 Here we create a new configuration object. Hence the zookeeper client port is 
 taken to be 21818.
 Ideally due to improper shutdown of some prev zk cluster that was running in 
 21818 the test case was able to connect to this but the port being different 
 it could not find the /hbase node.
 Hence the failure has happened.
 The remaining two testcases in TestHTablePool that failed also has the 
 similar problem. Even the failure in build #2119 is exactly the same.
 There should be a mechanism from the test for the client code to know to 
 which zk he should connect to. 
 Another intersting thing
 All testcases are using new HTable(conf, tablename).
 Only these 3 test cases are using it like new HTable(tablename). Hence the 
 problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4297) TableMapReduceUtil overwrites user supplied options

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4297:
--

Fix Version/s: 0.90.5

 TableMapReduceUtil overwrites user supplied options
 ---

 Key: HBASE-4297
 URL: https://issues.apache.org/jira/browse/HBASE-4297
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Jan Lukavsky
 Fix For: 0.90.5

 Attachments: HBASE-4297.patch


 Job configuration is overwritten by hbase-default and hbase-site in 
 TableMapReduceUtil.initTable(Mapper|Reducer)Job, causing unexpected behavior 
 in the following code:
 {noformat}
 Configuration conf = HBaseConfiguration.create();
 // change keyvalue size
 conf.setInt(hbase.client.keyvalue.maxsize, 20971520);
 Job job = new Job(conf, ...);
 TableMapReduceUtil.initTableMapperJob(...);
 // the job doesn't have the option changed, uses it from hbase-site or 
 hbase-default
 job.submit();
 {noformat}
 Although in this case it could be fixed by moving the set() after 
 initTableMapperJob(), in case where user wants to change some option using 
 GenericOptionsParser and -D this is impossible, making this cool feature 
 useless.
 In the 0.20.x era this code behaved as expected. The solution of this problem 
 should be that we don't overwrite the options, but just read them if they are 
 missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves

2011-11-16 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151323#comment-13151323
 ] 

Lars Hofhansl commented on HBASE-4654:
--

I was thinking about that, would certainly be more user friendly. That means, 
though, that we already have to get the peerClusterId at that time, which in 
turns means that we have to make the connection to the peer cluster's ZK right 
away. 

On the other hand, I don't expect that to be a common scenario, just a 
safeguard against user error.

 [replication] Add a check to make sure we don't replicate to ourselves
 --

 Key: HBASE-4654
 URL: https://issues.apache.org/jira/browse/HBASE-4654
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.5

 Attachments: 4654-trunk.txt


 It's currently possible to add a peer for replication and point it to the 
 local cluster, which I believe could very well happen for those like us that 
 use only one ZK ensemble per DC so that only the root znode changes when you 
 want to set up replication intra-DC.
 I don't think comparing just the cluster ID would be enough because you would 
 normally use a different one for another cluster and nothing will block you 
 from pointing elsewhere.
 Comparing the ZK ensemble address doesn't work either when you have multiple 
 DNS entries that point at the same place.
 I think this could be resolved by looking up the master address in the 
 relevant znode as it should be exactly the same thing in the case where you 
 have the same cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: (was: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch)

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: (was: 0002-Temporary-fix-to-remove-leaked-regions.patch)

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch
0002-Temporary-fix-to-remove-leaked-regions.patch

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: (was: 0002-Temporary-fix-to-remove-leaked-regions.patch)

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: (was: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch)

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: 0002-Temporary-fix-to-remove-leaked-regions.patch

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Attachment: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151346#comment-13151346
 ] 

stack commented on HBASE-4611:
--

Thank you Marek.  Let me try again later today...

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray
Assignee: Nicolas Spiegelberg
 Fix For: 0.92.0, 0.94.0

 Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, 
 D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, 
 D207.1.patch, D21.1.patch, D21.1.patch, HBASE-4611.D423.1.patch, 
 HBASE-4611.D423.2.patch


 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HBASE-4799:
-

Status: Patch Available  (was: Open)

Resolves Janitor race conditions.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151350#comment-13151350
 ] 

Jean-Daniel Cryans commented on HBASE-4799:
---

At first glance, this sounds a whole lot like HBASE-4238.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151362#comment-13151362
 ] 

Max Lapan commented on HBASE-4799:
--

This bug differs from HBASE-4238. 4238 is about regions splitting too fast, but 
4799 about regions splitting too slow.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151363#comment-13151363
 ] 

Ted Yu commented on HBASE-4799:
---

@Max:
HadoopQA picks the latest patch. Looks like it is running through test suite:
https://builds.apache.org/job/PreCommit-HBASE-Build/265/console

Since your fix is marked against 0.90.4 and HBASE-4238 was integrated into 
0.90.5 (absent from 
http://archive.cloudera.com/cdh/3/hbase-0.90.4+49.1.CHANGES.txt), I wonder if 
you could try Stack's fix out.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151366#comment-13151366
 ] 

Ted Yu commented on HBASE-4799:
---

@Max:
I didn't see your comment @ 16/Nov/11 17:37 when I typed the response above.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Max Lapan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151370#comment-13151370
 ] 

Max Lapan commented on HBASE-4799:
--

I tried, didn't help. 4238 is a different issue, appeared when regions splitted 
twice in a row.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Lars Hofhansl (Created) (JIRA)
Result.compareResults is incorrect
--

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl


A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
This condition:
{code}
  if (!ourKVs[i].equals(replicatedKVs[i]) 
  !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
throw new Exception(This result was different: 
{code}
should be
{code}
  if (!ourKVs[i].equals(replicatedKVs[i]) ||
  !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
throw new Exception(This result was different: 
{code}

Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4800:
-

Attachment: 4800.txt

Simple patch with test

 Result.compareResults is incorrect
 --

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl
 Attachments: 4800.txt


 A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
 This condition:
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) 
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 should be
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) ||
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4800:
-

Fix Version/s: 0.90.5
   0.94.0
   0.92.0
 Assignee: Lars Hofhansl

 Result.compareResults is incorrect
 --

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4800.txt


 A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
 This condition:
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) 
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 should be
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) ||
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151402#comment-13151402
 ] 

stack commented on HBASE-4800:
--

+1

 Result.compareResults is incorrect
 --

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4800.txt


 A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
 This condition:
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) 
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 should be
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) ||
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region h

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151411#comment-13151411
 ] 

stack commented on HBASE-4797:
--

Thinking some more on this, we don't need to rename recovered.edits files.  The 
files are named for the first sequenceid in the file, so, we could just do file 
listing and sort the return.  Then we'd have range of sequenceids per file.  We 
could then just pass on files with edits that are smaller than regions current 
seqid.

 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Give recovered.edits files better names, ones that include first and last sequence id so we can skip files with edits we know older than current region has

2011-11-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4797:
-

Priority: Critical  (was: Major)
Tags: noob
  Labels: noob  (was: )

 [availability] Give recovered.edits files better names, ones that include 
 first and last sequence id so we can skip files with edits we know older than 
 current region has
 --

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Priority: Critical
  Labels: noob

 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Lars Hofhansl (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-4800.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.90, 0.92, and trunk.

 Result.compareResults is incorrect
 --

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4800.txt


 A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
 This condition:
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) 
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 should be
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) ||
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4796:
--

Attachment: 4796.txt

Patch from Ramkrishna.

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4796.txt


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4796:
--

Status: Patch Available  (was: Open)

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4796.txt


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Fix Version/s: (was: 0.92.0)
   0.94.0

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151433#comment-13151433
 ] 

jirapos...@reviews.apache.org commented on HBASE-4213:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1786/
---

(Updated 2011-11-16 19:12:03.308667)


Review request for Todd Lipcon, Andrew Purtell and Subbu Iyer.


Changes
---

Patch addresses Lars' comments


Summary
---

bq. From Subbu:
here is the latest patch that support alter_instant, an instant schema change 
command that supports (Add, Modify, Delete column and Modify table) actions 
through ZK.

1. This pattern capitalizes on the fact that HRI's are now in HDFS and need not 
be sent over again from Master to RS cloud on every schema change event.

2. Offers real time instant schema change as we bypass the explicit bulk 
reassign (unassign + assign) of regions from master to RS.

3. Offers fault tolerant schema change support as schema changes now go through 
ZK. Secondary master taking over a failed schema change will be addressed 
through a separate JIRA.


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/MasterServices.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java 
1202381 
  /src/main/java/org/apache/hadoop/hbase/master/handler/ModifyTableHandler.java 
1202381 
  
/src/main/java/org/apache/hadoop/hbase/master/handler/TableAddFamilyHandler.java
 1202381 
  
/src/main/java/org/apache/hadoop/hbase/master/handler/TableDeleteFamilyHandler.java
 1202381 
  /src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java 
1202523 
  
/src/main/java/org/apache/hadoop/hbase/master/handler/TableModifyFamilyHandler.java
 1202381 
  /src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java 
1202381 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1202381 
  /src/main/java/org/apache/hadoop/hbase/regionserver/OnlineRegions.java 
1202381 
  
/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
1202381 
  /src/main/resources/hbase-default.xml 1202381 
  /src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java 
PRE-CREATION 
  
/src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeFailover.java
 PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 1202381 
  /src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java 
1202381 

Diff: https://reviews.apache.org/r/1786/diff


Testing
---

Unit tests pass.


Thanks,

Ted



 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as 

[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151434#comment-13151434
 ] 

Hadoop QA commented on HBASE-4799:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503909/0001-Fix-of-Regions-Leaks-problem-in-janitor.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -163 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 51 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/265//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/265//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/265//console

This message is automatically generated.

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4761) Add Developer Debug Options to HBase Config

2011-11-16 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151440#comment-13151440
 ] 

Kannan Muthukkaruppan commented on HBASE-4761:
--

Going to sit down with Nicolas tomorrow and take care of this!

 Add Developer Debug Options to HBase Config
 ---

 Key: HBASE-4761
 URL: https://issues.apache.org/jira/browse/HBASE-4761
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Attachments: HBASE-4761.patch


 Add in optional HBase configuration options that core developers will 
 commonly use: an option to enable JDWP debugging  an option to use a 
 separate logfile for GC information.  (Part of the effort to move 89-fb 
 features over to trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4199) blockCache summary - backend

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4199:
--

Fix Version/s: 0.92.0

 blockCache summary - backend
 

 Key: HBASE-4199
 URL: https://issues.apache.org/jira/browse/HBASE-4199
 Project: HBase
  Issue Type: Sub-task
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4199.v5, java_HBASE_4199.patch, 
 java_HBASE_4199_v2.patch, java_HBASE_4199_v3.patch, java_HBASE_4199_v4.patch


 This is the backend work for the blockCache summary.  Change to BlockCache 
 interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition 
 to HRegionInterface, and HRegionServer.
 This will NOT include any of the web UI or anything else like that.  That is 
 for another sub-task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4628) Enhance Table Create Presplit Functionality within the HBase Shell

2011-11-16 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151448#comment-13151448
 ] 

Phabricator commented on HBASE-4628:


Kannan has accepted the revision HBASE-4628 [jira] Enhance Table Create 
Presplit Functionality within the HBase Shell.

  Looks great! Sweet feature.

  Nicolas will walk me through the commit flow tomorrow. So should get 
committed tomorrow.

REVISION DETAIL
  https://reviews.facebook.net/D417


 Enhance Table Create Presplit Functionality within the HBase Shell
 --

 Key: HBASE-4628
 URL: https://issues.apache.org/jira/browse/HBASE-4628
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
 Attachments: HBASE-4628.D411.1.patch, HBASE-4628.D417.1.patch, 
 HBASE-4628.D417.2.patch, HBASE-4628.D429.1.patch


 Currently, we allow the user to presplit in the HBase shell by explicitly 
 listing the startkey of all the region shards that they want.  Instead, we 
 should provide the RegionSplitter functionality of choosing a split 
 algorithm, followed by the number of splits that they want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4668) List HDFS enhancements to speed up backups for HBase

2011-11-16 Thread Karthik Ranganathan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Ranganathan reassigned HBASE-4668:
--

Assignee: Pritam Damania

 List HDFS enhancements to speed up backups for HBase
 

 Key: HBASE-4668
 URL: https://issues.apache.org/jira/browse/HBASE-4668
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Pritam Damania

 There are a host of improvements that help:
 - HDFS fast copy
 - Various enhancements to fast copy to speed up things
 - File level hard links - which does ext3 hardlinks instead of copying blocks 
 thereby saving a lot of iops
 Need to list out the HDFS jira's and have patches on them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4668) List HDFS enhancements to speed up backups for HBase

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151449#comment-13151449
 ] 

Karthik Ranganathan commented on HBASE-4668:


@Andrew - totally, will let Pritam comment on that :)

 List HDFS enhancements to speed up backups for HBase
 

 Key: HBASE-4668
 URL: https://issues.apache.org/jira/browse/HBASE-4668
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Pritam Damania

 There are a host of improvements that help:
 - HDFS fast copy
 - Various enhancements to fast copy to speed up things
 - File level hard links - which does ext3 hardlinks instead of copying blocks 
 thereby saving a lot of iops
 Need to list out the HDFS jira's and have patches on them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Doug Meil (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151452#comment-13151452
 ] 

Doug Meil commented on HBASE-4655:
--

I'll gladly port this to the book, and I'd like to add this in here...
http://hbase.apache.org/book.html#ops.backup
... with the existing backup info.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151455#comment-13151455
 ] 

Ted Yu commented on HBASE-4739:
---

Trial patch makes sense.

For unassign, the following change to javadoc is inaccurate:
{code}
   * Updates the RegionState and creates a zk node.
{code}
We still send close RPC, right ?

For sendRegionClose():
{code}
   * sends the CLOSE RPC to a region server.
   * @param region server to be unassigned
{code}
Should read region server which receives CLOSE RPC

For createNodePendingClose(), please replace CLOSING with PEND_CLOSE (not 
PEND_CLOSING).

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151457#comment-13151457
 ] 

Karthik Ranganathan commented on HBASE-4655:


Sounds great Doug! Maybe we make a new section, keep adding stuff in, and 
deprecate the old stuff? Or whatever works...

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4280) [replication] ReplicationSink can deadlock itself via handlers

2011-11-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4280:
--

Fix Version/s: 0.94.0
   0.92.0

 [replication] ReplicationSink can deadlock itself via handlers
 --

 Key: HBASE-4280
 URL: https://issues.apache.org/jira/browse/HBASE-4280
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: HBASE-4280-0.90.patch


 I've experienced this problem a few times, ReplicationSink calls are received 
 through the normal handlers and potentially can call itself which, in certain 
 situations, call fill up all the handlers. For example, 10 handlers that are 
 all replication calls are all trying to talk to the local server at the same 
 time.
 HRS.replicateLogEntries should have @QosPriority(priority=HIGH_QOS) to use 
 the other set of handlers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: 4213-0.92.v4

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4611) Add support for Phabricator/Differential as an alternative code review tool

2011-11-16 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151485#comment-13151485
 ] 

Nicolas Spiegelberg commented on HBASE-4611:


@stack:  note that the certificate installation is not a long process.  It just 
has you visit a webpage while logged into FB  give a displayed ID.  Although 
annonymous access for contributors should be fine, I'd recommend secure access 
for all committers since they'll be approving diffs for inclusion.

 Add support for Phabricator/Differential as an alternative code review tool
 ---

 Key: HBASE-4611
 URL: https://issues.apache.org/jira/browse/HBASE-4611
 Project: HBase
  Issue Type: Task
Reporter: Jonathan Gray
Assignee: Nicolas Spiegelberg
 Fix For: 0.92.0, 0.94.0

 Attachments: D153.1.patch, D165.1.patch, D165.2.patch, D171.1.patch, 
 D177.1.patch, D177.2.patch, D183.1.patch, D189.1.patch, D201.1.patch, 
 D207.1.patch, D21.1.patch, D21.1.patch, HBASE-4611.D423.1.patch, 
 HBASE-4611.D423.2.patch


 From http://phabricator.org/ : Phabricator is a open source collection of 
 web applications which make it easier to write, review, and share source 
 code. It is currently available as an early release. Phabricator was 
 developed at Facebook.
 It's open source so pretty much anyone could host an instance of this 
 software.
 To begin with, there will be a public-facing instance located at 
 http://reviews.facebook.net (sponsored by Facebook and hosted by the OSUOSL 
 http://osuosl.org).
 We will use this JIRA to deal with adding (and ensuring) Apache-friendly 
 support that will allow us to do code reviews with Phabricator for HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151489#comment-13151489
 ] 

Hadoop QA commented on HBASE-4796:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503920/4796.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -163 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 51 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/266//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/266//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/266//console

This message is automatically generated.

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4796.txt


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 

[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151492#comment-13151492
 ] 

Ted Yu commented on HBASE-4796:
---

+1 on patch.

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4796.txt


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151508#comment-13151508
 ] 

Todd Lipcon commented on HBASE-4655:


Two quick notes from looking over the doc:

- the names are a little confusing to me - in-cluster back up is actually two 
clusters, right? I'd call your RBU an in-cluster backup, I'd call your CBU an 
in-datacenter backup, and I'd call your DBU a cross-datacenter backup, DR 
backup, or BCP backup.

- For RBU, maybe we can get atomicity in a simpler manner by having the region 
server initiate the copy of hfiles? It can hold the lock to block flushes while 
the copies happen (they're hard-link copies, right?) 


 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151517#comment-13151517
 ] 

stack commented on HBASE-4796:
--

+1 on patch (this is what J-D and I discussed yesterday evening).  One change 
I'd make though is remove of that buried retrun down inside of the catch do 
an if/else abort I can imagine someone reading this code and just not 
seeing the 'return'.  Thanks Ram.

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0

 Attachments: 4796.txt


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread Nicolas Spiegelberg (Created) (JIRA)
alter_status shell prints sensible message at completion


 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial


The alter_status command used to print 0/0 once an alter operation had 
completed and its progress was no longer available. Now it instad indicates 
that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151521#comment-13151521
 ] 

Hudson commented on HBASE-4800:
---

Integrated in HBase-TRUNK #2448 (See 
[https://builds.apache.org/job/HBase-TRUNK/2448/])
HBASE-4800  Result.compareResults is incorrect (James Taylor and Lars H)

larsh : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Result.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestResult.java


 Result.compareResults is incorrect
 --

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4800.txt


 A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
 This condition:
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) 
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 should be
 {code}
   if (!ourKVs[i].equals(replicatedKVs[i]) ||
   !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
 throw new Exception(This result was different: 
 {code}
 Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151523#comment-13151523
 ] 

Nicolas Spiegelberg commented on HBASE-4801:


Part of 89-fb to 92 port.  See r1182035 

 alter_status shell prints sensible message at completion
 

 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial

 The alter_status command used to print 0/0 once an alter operation had 
 completed and its progress was no longer available. Now it instad indicates 
 that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4801:
---

Attachment: HBASE-4801.patch

 alter_status shell prints sensible message at completion
 

 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Attachments: HBASE-4801.patch


 The alter_status command used to print 0/0 once an alter operation had 
 completed and its progress was no longer available. Now it instad indicates 
 that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4802) Disable show table metrics in bulk loader

2011-11-16 Thread Nicolas Spiegelberg (Created) (JIRA)
Disable show table metrics in bulk loader
-

 Key: HBASE-4802
 URL: https://issues.apache.org/jira/browse/HBASE-4802
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0


During bulk load, the Configuration object may be set to null.  This caused an 
NPE in per-CF metrics because it consults the Configuration to determine 
whether to show the Table name.  Need to add simple change to allow the conf to 
be null  not specify table name in that instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4802) Disable show table metrics in bulk loader

2011-11-16 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151530#comment-13151530
 ] 

Nicolas Spiegelberg commented on HBASE-4802:


Part of 89-fb to 92 port.  See r1182037

 Disable show table metrics in bulk loader
 -

 Key: HBASE-4802
 URL: https://issues.apache.org/jira/browse/HBASE-4802
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4802.patch


 During bulk load, the Configuration object may be set to null.  This caused 
 an NPE in per-CF metrics because it consults the Configuration to determine 
 whether to show the Table name.  Need to add simple change to allow the conf 
 to be null  not specify table name in that instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4802) Disable show table metrics in bulk loader

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4802:
---

Attachment: HBASE-4802.patch

 Disable show table metrics in bulk loader
 -

 Key: HBASE-4802
 URL: https://issues.apache.org/jira/browse/HBASE-4802
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4802.patch


 During bulk load, the Configuration object may be set to null.  This caused 
 an NPE in per-CF metrics because it consults the Configuration to determine 
 whether to show the Table name.  Need to add simple change to allow the conf 
 to be null  not specify table name in that instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4802) Disable show table metrics in bulk loader

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4802:
---

Assignee: Liyin Tang  (was: Nicolas Spiegelberg)
  Status: Patch Available  (was: Open)

 Disable show table metrics in bulk loader
 -

 Key: HBASE-4802
 URL: https://issues.apache.org/jira/browse/HBASE-4802
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Liyin Tang
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4802.patch


 During bulk load, the Configuration object may be set to null.  This caused 
 an NPE in per-CF metrics because it consults the Configuration to determine 
 whether to show the Table name.  Need to add simple change to allow the conf 
 to be null  not specify table name in that instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4801:
---

Status: Patch Available  (was: Open)

patch created by one of our interns: Charles Gist.

 alter_status shell prints sensible message at completion
 

 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Attachments: HBASE-4801.patch


 The alter_status command used to print 0/0 once an alter operation had 
 completed and its progress was no longer available. Now it instad indicates 
 that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread Nicolas Spiegelberg (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151531#comment-13151531
 ] 

Nicolas Spiegelberg edited comment on HBASE-4801 at 11/16/11 9:29 PM:
--

patch created by one of our interns: Christopher Gist.

  was (Author: nspiegelberg):
patch created by one of our interns: Charles Gist.
  
 alter_status shell prints sensible message at completion
 

 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Attachments: HBASE-4801.patch


 The alter_status command used to print 0/0 once an alter operation had 
 completed and its progress was no longer available. Now it instad indicates 
 that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4543) major_compact '.META.' has no effect

2011-11-16 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151538#comment-13151538
 ] 

Nicolas Spiegelberg commented on HBASE-4543:


does this occur in trunk?  can we get a patch for this?

 major_compact '.META.' has no effect
 

 Key: HBASE-4543
 URL: https://issues.apache.org/jira/browse/HBASE-4543
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621, 0.89.20100924
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Attachments: 0001-fix-the-issue-of-.META.-getting-ignored.patch


 major_compact '.META.' has no effect, although major_compact 
 'any_other_table' works fine from the shell.
 This issue seems to only affect 0.89. The apache-trunk seems to handle this 
 case properly.
 The issue is that getTableRegions() in HMaster.java only works if the 
 tableName given is a normal table.
 The methodology (using a MetaScanner to look through the .META. table for the 
 tableName) does not work if
 the tableName is .META.
 The fix modifies getTableRegions() to check if the tableName is .META.; and 
 if so, handle it accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151545#comment-13151545
 ] 

stack commented on HBASE-4799:
--

Thanks for digging in here Max.

This comment is now wrong?

{code}
+  // Remove daughters from the parent IFF the daughter region exists in FS.
+  // If there is no daughter region in the filesystem, must be because of
+  // a failed split.  The ServerShutdownHandler will do the fixup.  Don't
+  // do any deletes in here that could intefere with ServerShutdownHandler
+  // fixup
{code}

hasNoReferences will return if no daughter dir or if no references (so if no 
daughter dir we'll delete parent).

Otherwise, I'm fine w/ tying together the removals of splitA and splitB... all 
in the one go; it dumbs down the number of possible states which is usually a 
good thing.

One thing though, rather than removeDaughterFromParent, shouldn't we do the 
clear of both splitA and splitB in the one go since its same row (we could have 
strange case where splitA was removed but then we crash before splitB was 
removed).  Don't we need a removeDaughter*s*FromParent; i.e. plural?

Good stuff Max.



 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4799) Catalog Janitor logic bug causes region leackage

2011-11-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4799:
-

Fix Version/s: 0.90.5
   0.92.0

 Catalog Janitor logic bug causes region leackage
 

 Key: HBASE-4799
 URL: https://issues.apache.org/jira/browse/HBASE-4799
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Max Lapan
Assignee: Max Lapan
Priority: Critical
 Fix For: 0.92.0, 0.90.5

 Attachments: 0001-Fix-of-Regions-Leaks-problem-in-janitor.patch, 
 0002-Temporary-fix-to-remove-leaked-regions.patch


 When region split takes a significant amount of time, CatalogJanitor can 
 cleanup one of SPLIT records, but left another in META. When another split 
 finish, janitor cleans left SPLIT record, but parent regions haven't removed 
 from FS and META not cleared.
 The race condition is follows:
 1. region split started
 2. one of regions splitted, i.e. A (have no reference storefiles) but other 
 (B) doesn't
 3. janitor started and in routine checkDaughter removes SPLITA from meta, but 
 see that SPLITB has references and does nothing.
 4. region B completes split
 5. janitor wakes up, removes SPLITB, but see that there is no records for A 
 and does nothing again.
 Result - parent region hangs forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151554#comment-13151554
 ] 

Hadoop QA commented on HBASE-4213:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503931/4213-0.92.v4
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 58 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/267//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/267//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/267//console

This message is automatically generated.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread Karthik Ranganathan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151564#comment-13151564
 ] 

Karthik Ranganathan commented on HBASE-4655:


For #1, totally :) internally, we use the term cluster to denote a section of 
the data center (as opposed to the HBase cluster), a data center is composed of 
a number of clusters, hence the name. in-DC and cross-DC work.

For #2, this makes the running cluster stall and not take updates for the time 
period of the copy. It is fast-copy with hard-links underneath, but there is 
nothing in the current design that would stop it from being used against a 
remote cluster or a DFS version without the hard-link. Also, if for some reason 
the hard link fails, it does a deep copy, so it could have longer stalls.

 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management

2011-11-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151566#comment-13151566
 ] 

nkeywal commented on HBASE-4763:


Surefire revision for SUREFIRE-791  SUREFIRE-785: r1202059

 Integrate surefire and junit for category management
 

 Key: HBASE-4763
 URL: https://issues.apache.org/jira/browse/HBASE-4763
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: surefire_hbase.v2.patch


 As of today, Surefire integrates category on the trunk of 2.11 version: 
 http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private 
 patches as well.
 It may impact JUnit: https://github.com/KentBeck/junit/issues/359
 This jira is about this integration. We will need a repo for this.
 For the naming of the versions to be created, I don't know if there is a 
 convention. If not I would propose: 2.10-patched-HBASE
  
 Obviously, it's important to get our changes integrated in the main release: 
 we're not forking surefire  junit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4803) Split log worker should terminate properly when waiting for znode

2011-11-16 Thread Nicolas Spiegelberg (Created) (JIRA)
Split log worker should terminate properly when waiting for znode
-

 Key: HBASE-4803
 URL: https://issues.apache.org/jira/browse/HBASE-4803
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Prakash Khemani
Priority: Minor
 Fix For: 0.94.0


This is an attempt to fix the fact that SplitLogWorker threads were not being 
terminated properly in some multi-master unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4803) Split log worker should terminate properly when waiting for znode

2011-11-16 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151569#comment-13151569
 ] 

Nicolas Spiegelberg commented on HBASE-4803:


Part of 89-fb to 92 port.  See r1188420

 Split log worker should terminate properly when waiting for znode
 -

 Key: HBASE-4803
 URL: https://issues.apache.org/jira/browse/HBASE-4803
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Prakash Khemani
Priority: Minor
 Fix For: 0.94.0


 This is an attempt to fix the fact that SplitLogWorker threads were not being 
 terminated properly in some multi-master unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4803) Split log worker should terminate properly when waiting for znode

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4803:
---

Status: Patch Available  (was: Open)

 Split log worker should terminate properly when waiting for znode
 -

 Key: HBASE-4803
 URL: https://issues.apache.org/jira/browse/HBASE-4803
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Prakash Khemani
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4803.patch


 This is an attempt to fix the fact that SplitLogWorker threads were not being 
 terminated properly in some multi-master unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4803) Split log worker should terminate properly when waiting for znode

2011-11-16 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4803:
---

Attachment: HBASE-4803.patch

 Split log worker should terminate properly when waiting for znode
 -

 Key: HBASE-4803
 URL: https://issues.apache.org/jira/browse/HBASE-4803
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Prakash Khemani
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4803.patch


 This is an attempt to fix the fact that SplitLogWorker threads were not being 
 terminated properly in some multi-master unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4543) major_compact '.META.' has no effect

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151574#comment-13151574
 ] 

stack commented on HBASE-4543:
--

Its working in trunk.  I see these messages when I tried it just now:

{code}
2011-11-16 22:20:53,676 DEBUG 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Small Compaction 
requested: regionName=.META.,,1.1028785192, storeName=info, fileCount=1, 
fileSize=582.8k (582.8k), priority=1, time=9000612904273038; Because: 
User-triggered major compaction; compaction_queue=(0:0), split_queue=0
{code}

(Notice the 'User-triggered...')

 major_compact '.META.' has no effect
 

 Key: HBASE-4543
 URL: https://issues.apache.org/jira/browse/HBASE-4543
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621, 0.89.20100924
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Attachments: 0001-fix-the-issue-of-.META.-getting-ignored.patch


 major_compact '.META.' has no effect, although major_compact 
 'any_other_table' works fine from the shell.
 This issue seems to only affect 0.89. The apache-trunk seems to handle this 
 case properly.
 The issue is that getTableRegions() in HMaster.java only works if the 
 tableName given is a normal table.
 The methodology (using a MetaScanner to look through the .META. table for the 
 tableName) does not work if
 the tableName is .META.
 The fix modifies getTableRegions() to check if the tableName is .META.; and 
 if so, handle it accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151577#comment-13151577
 ] 

stack commented on HBASE-4763:
--

You want to give us plugins to host N?   Seems like Gary is volunteering 
hosting .

 Integrate surefire and junit for category management
 

 Key: HBASE-4763
 URL: https://issues.apache.org/jira/browse/HBASE-4763
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: surefire_hbase.v2.patch


 As of today, Surefire integrates category on the trunk of 2.11 version: 
 http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private 
 patches as well.
 It may impact JUnit: https://github.com/KentBeck/junit/issues/359
 This jira is about this integration. We will need a repo for this.
 For the naming of the versions to be created, I don't know if there is a 
 convention. If not I would propose: 2.10-patched-HBASE
  
 Obviously, it's important to get our changes integrated in the main release: 
 we're not forking surefire  junit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: 4213-trunk.txt

Latest patch adds test category.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 
 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Status: Open  (was: Patch Available)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 
 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Status: Patch Available  (was: Open)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 
 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151581#comment-13151581
 ] 

Ted Yu commented on HBASE-4213:
---

TestDistributedLogSplitting#testWorkerAbort passed on my MacBook.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 
 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.

2011-11-16 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151583#comment-13151583
 ] 

Ted Yu commented on HBASE-4213:
---

HBASE-4370 has mostly been covered by latest patch.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.94.0

 Attachments: 4213-0.92.txt, 4213-0.92.v2, 4213-0.92.v4, 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Fixed_NPE_in_RS_during_alter_.patch, 
 4213-Instant_Schema_change_through_ZK.patch, 4213-Nov-2-2011_patch_.patch, 
 4213-Nov072011-Patch_to_support_concurrent_split_and_alter__.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-trunk.txt, 
 4213-v9.txt, 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4763) Integrate surefire and junit for category management

2011-11-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151586#comment-13151586
 ] 

nkeywal commented on HBASE-4763:


Yes, I am working with him, he's building surefire  junit today.

 Integrate surefire and junit for category management
 

 Key: HBASE-4763
 URL: https://issues.apache.org/jira/browse/HBASE-4763
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: surefire_hbase.v2.patch


 As of today, Surefire integrates category on the trunk of 2.11 version: 
 http://jira.codehaus.org/browse/SUREFIRE-329 . It may requires private 
 patches as well.
 It may impact JUnit: https://github.com/KentBeck/junit/issues/359
 This jira is about this integration. We will need a repo for this.
 For the naming of the versions to be created, I don't know if there is a 
 convention. If not I would propose: 2.10-patched-HBASE
  
 Obviously, it's important to get our changes integrated in the main release: 
 we're not forking surefire  junit!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4655) Document architecture of backups

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151589#comment-13151589
 ] 

stack commented on HBASE-4655:
--

Echo Todd #1 remarks.

For '...incremental backups at the Stage 1 (RBU) level', won't the time between 
step between b and d be 'large' and during the copy time, the list of files 
could change on you; i.e. when you go to copy a file, it maybe have been 
removed because it'd been compacted.  What do you do in this case?  (Your list 
may not included the compacted file)?

For a.The backups rely on the clocks across the various region-servers for 
determining the point in time to which the edits are re-played, so, say a 
server is lagging the others by a good bit?   When replaying the edits, you'd 
replay edits from when this lagging server said the backup began?

How will you know which hlogs to replay?  You'll open it and look at first and 
last edits in the file?  Or should we write out metadata files for hlogs?  Or 
is it enough relying on hdfs modtime?

Looks great K.



 Document architecture of backups
 

 Key: HBASE-4655
 URL: https://issues.apache.org/jira/browse/HBASE-4655
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan
 Attachments: HBase Backups Architecture.docx


 Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4801) alter_status shell prints sensible message at completion

2011-11-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4801:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed branch and trunk.  Thanks Nicolas and Christopher.

 alter_status shell prints sensible message at completion
 

 Key: HBASE-4801
 URL: https://issues.apache.org/jira/browse/HBASE-4801
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-4801.patch


 The alter_status command used to print 0/0 once an alter operation had 
 completed and its progress was no longer available. Now it instad indicates 
 that all regions were updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4802) Disable show table metrics in bulk loader

2011-11-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151592#comment-13151592
 ] 

stack commented on HBASE-4802:
--

Is this right Nicolas?  You add a null check but at same time are removing a 
null check.

 Disable show table metrics in bulk loader
 -

 Key: HBASE-4802
 URL: https://issues.apache.org/jira/browse/HBASE-4802
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Liyin Tang
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4802.patch


 During bulk load, the Configuration object may be set to null.  This caused 
 an NPE in per-CF metrics because it consults the Configuration to determine 
 whether to show the Table name.  Need to add simple change to allow the conf 
 to be null  not specify table name in that instance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >