[jira] [Updated] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5840:
--

Fix Version/s: 0.96.0

 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing 
 the old status
 --

 Key: HBASE-5840
 URL: https://issues.apache.org/jira/browse/HBASE-5840
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Gopinathan A
 Fix For: 0.96.0, 0.94.1


 TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will 
 keeps showing old status.
 This will miss leads the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5809:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Avoid move api to take the destination server same as the source server.
 

 Key: HBASE-5809
 URL: https://issues.apache.org/jira/browse/HBASE-5809
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Priority: Minor
  Labels: patch
 Fix For: 0.96.0

 Attachments: HBASE-5809.patch


 In Move currently we take any destination specified and if the destination is 
 same as the source we still do unassign and assign.  Here we can have 
 problems due to RegionAlreadyInTransitionException and thus hanging the 
 region in RIT for long time.  We can avoid this scenario by not allowing the 
 move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5809:
--

Fix Version/s: (was: 0.94.1)
   Issue Type: Improvement  (was: Bug)

Committed to trunk.  Made it as an improvement.
Thanks for the patch Rajesh.  
Thanks for the review Ted and Uma.

 Avoid move api to take the destination server same as the source server.
 

 Key: HBASE-5809
 URL: https://issues.apache.org/jira/browse/HBASE-5809
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Priority: Minor
  Labels: patch
 Fix For: 0.96.0

 Attachments: HBASE-5809.patch


 In Move currently we take any destination specified and if the destination is 
 same as the source we still do unassign and assign.  Here we can have 
 problems due to RegionAlreadyInTransitionException and thus hanging the 
 region in RIT for long time.  We can avoid this scenario by not allowing the 
 move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5809:
--

Attachment: HBASE-5809.patch

This is what i finally committed. Small formatting change and corrected typo 
error.

 Avoid move api to take the destination server same as the source server.
 

 Key: HBASE-5809
 URL: https://issues.apache.org/jira/browse/HBASE-5809
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Priority: Minor
  Labels: patch
 Fix For: 0.96.0

 Attachments: HBASE-5809.patch, HBASE-5809.patch


 In Move currently we take any destination specified and if the destination is 
 same as the source we still do unassign and assign.  Here we can have 
 problems due to RegionAlreadyInTransitionException and thus hanging the 
 region in RIT for long time.  We can avoid this scenario by not allowing the 
 move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5635:
--

Attachment: HBASE-5635._trunk.patch

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, 
 HBASE-5635._trunk.patch, HBASE-5635.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-04-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5635:
--

Attachment: HBASE-5635_0.94.patch

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, 
 HBASE-5635._trunk.patch, HBASE-5635.patch, HBASE-5635_0.94.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-19 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Fix Version/s: 0.94.1
   Issue Type: Bug  (was: Improvement)

@Stack 
Yes i have a configured LB.  But as we provide option to use master services in 
the LB and now if i try to use the 'balancer' object there, it is a new one.  I 
am updating it to a bug Stack.

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, 
 HBASE-5737_2.patch, HBASE-5737_3.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-19 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5809:
--

Affects Version/s: (was: 0.94.0)
   0.92.1
Fix Version/s: 0.94.1
   0.96.0

Will commit tomorrow unless no objection.

 Avoid move api to take the destination server same as the source server.
 

 Key: HBASE-5809
 URL: https://issues.apache.org/jira/browse/HBASE-5809
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Priority: Minor
  Labels: patch
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5809.patch


 In Move currently we take any destination specified and if the destination is 
 same as the source we still do unassign and assign.  Here we can have 
 problems due to RegionAlreadyInTransitionException and thus hanging the 
 region in RIT for long time.  We can avoid this scenario by not allowing the 
 move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-18 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Fix Version/s: 0.96.0

Can this be an improvement or bug?  I think the 'balancer' usage in HMaster and 
AM was a bug right?

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, 
 HBASE-5737_2.patch, HBASE-5737_3.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-17 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5545:
--

Fix Version/s: 0.94.1

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.7, 0.92.2, 0.94.1


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-17 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5545:
--

Status: Patch Available  (was: Open)

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5545.patch


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-17 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5545:
--

Attachment: HBASE-5545.patch

Patch for 0.94.

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.7, 0.92.2, 0.94.1

 Attachments: HBASE-5545.patch


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-17 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5545:
--

Attachment: HBASE-5545.patch

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.7, 0.92.2, 0.94.0

 Attachments: HBASE-5545.patch, HBASE-5545.patch


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Updated] (HBASE-5782) Not all the regions are getting assigned after the log splitting.

2012-04-16 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5782:
--

Attachment: HBASE-5782.patch

 Not all the regions are getting assigned after the log splitting.
 -

 Key: HBASE-5782
 URL: https://issues.apache.org/jira/browse/HBASE-5782
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0

 Attachments: HBASE-5782.patch


 Create a table with 1000 splits, after the region assignemnt, kill the 
 regionserver wich contains META table.
 Here few regions are missing after the log splitting and region assigment. 
 HBCK report shows multiple region holes are got created.
 Same scenario was verified mulitple times in 0.92.1, no issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-15 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Attachment: HBASE-5737_3.patch

Latest patch as per Stack's suggestion.

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, 
 HBASE-5737_2.patch, HBASE-5737_3.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-15 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Open  (was: Patch Available)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, 
 HBASE-5737_2.patch, HBASE-5737_3.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-15 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Patch Available  (was: Open)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, 
 HBASE-5737_2.patch, HBASE-5737_3.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Attachment: HBASE-5737_2.patch

Updated patch.

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Open  (was: Patch Available)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-11 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Patch Available  (was: Open)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Summary: Minor Improvements related to balancer.  (was: Use a treemap 
instead of hashmap in in AM.getAssignmentByTable used in balancer)

There are few more changes that we can do here
- Currently two different balancer objects are created in AssignmentManager 
and HMaster.  Unify them
- balancer.randomAssignment() is getting called once but based on whether the 
existingplan is null or if it is a force plan we either use the randomplan or 
not.
Ideally if we are extending a new balancer then randomAssignment will be called 
but later the plan given by randomAssignment may not be used. 

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor

 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Patch Available  (was: Open)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Open  (was: Patch Available)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Attachment: HBASE-5737_1.patch

Pls review the latest one.  If you feel still TreeMap change is not needed, i 
think the other changes are needed.

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.

2012-04-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5737:
--

Status: Patch Available  (was: Open)

 Minor Improvements related to balancer.
 ---

 Key: HBASE-5737
 URL: https://issues.apache.org/jira/browse/HBASE-5737
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Attachments: HBASE-5737.patch, HBASE-5737_1.patch


 Currently in Am.getAssignmentByTable()  we use a result map which is currenly 
 a hashmap.  It could be better if we have a treeMap.  Even in 
 MetaReader.fullScan we have the treeMap only so that we have the naming order 
 maintained. I felt this change could be very useful in cases where we are 
 extending the DefaultLoadBalancer.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-03-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5689:
--

Priority: Critical  (was: Major)

Making it critical as it is data loss related.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Attachments: 5689-testcase.patch, HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-29 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Status: Open  (was: Patch Available)

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5617_1.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-29 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Status: Patch Available  (was: Open)

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-29 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Attachment: HBASE-5617_2.patch

@Andy
This patch deals only with what is intended in the title of the JIRA.
Can i raise another JIRA for adding other hooks?


 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Attachment: HBASE-5617_1.patch

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5617_1.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Status: Patch Available  (was: Open)

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5617_1.patch


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-03-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5097:
--

Fix Version/s: 0.94.1
   0.96.0
   0.92.2

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5510) Pass region info in LoadBalancer.randomAssignment(ListServerName servers)

2012-03-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5510:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Pass region info in LoadBalancer.randomAssignment(ListServerName servers)
 ---

 Key: HBASE-5510
 URL: https://issues.apache.org/jira/browse/HBASE-5510
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.96.0

 Attachments: HBase-5010_3.patch, HBase-5510.patch, HBase-5510_2.patch


  In LB there is randomAssignment(ListServerName servers) API which will be 
 used by AM to assign
  a region from a down RS. [This will be also used in other cases like call to 
 assign() API from client]
  I feel it would be better to pass the HRegionInfo also into this method. 
 When the LB making a choice for a region
  assignment, when one RS is down, it would be nice that the LB knows for 
 which region it is doing this server selection.
 +Scenario+
  While one RS down, we wanted the regions to get moved to other RSs but a set 
 of regions stay together. We are having custom load balancer but with the 
 current way of LB interface this is not possible. Another way is I can allow 
 a random assignment of the regions at the RS down time. Later with a cluster 
 balance I can balance the regions as I need. But this might make regions 
 assign 1st to one RS and then again move to another. Also for some time 
 period my business use case can not get satisfied.
 Also I have seen some issue in JIRA which speaks about making sure that Root 
 and META regions always sit in some specific RSs. With the current LB API 
 this wont be possible in future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Attachment: HBASE-5584-2.patch

@Andrew
Instead of sleep i have added CountdownLatch for the assertion of create table. 
 Pls review.

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584-1.patch, HBASE-5584-2.patch, HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.

2012-03-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5617:
--

Component/s: coprocessors

 Provide coprocessor hooks in put flow while rollbackMemstore.
 -

 Key: HBASE-5617
 URL: https://issues.apache.org/jira/browse/HBASE-5617
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0


 With coprocessors hooks while put happens we have the provision to create new 
 puts to other tables or regions.  These puts can be done with writeToWal as 
 false.
 In 0.94 and above the puts are first written to memstore and then to WAL.  If 
 any failure in the WAL append or sync the memstore is rollbacked.  
 Now the problem is that if the put that happens in the main flow fails there 
 is no way to rollback the 
 puts that happened in the prePut.
 We can add coprocessor hooks to like pre/postRoolBackMemStore.  Is any one 
 hook enough here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-03-21 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5323:
--

Fix Version/s: (was: 0.94.0)
   0.94.1

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.94.1

 Attachments: HBASE-5323.patch, HBASE-5323.patch


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Attachment: HBASE-5584.patch

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Status: Patch Available  (was: Open)

Patch for review.

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5520:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.96.0

 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, 
 HBASE-5520_3.patch, HBASE-5520_4.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Attachment: HBASE-5584-1.patch

Updated patch.  In my local it was passing.  Now added some sleep after create 
so that we can ensure that the postCreateHandler is called.  Pls review.

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584-1.patch, HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Status: Open  (was: Patch Available)

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584-1.patch, HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers

2012-03-20 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5584:
--

Status: Patch Available  (was: Open)

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584-1.patch, HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-18 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5206:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving as committed.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-03-18 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5153:
--

   Resolution: Fixed
Fix Version/s: (was: 0.90.7)
   0.90.6
   Status: Resolved  (was: Patch Available)

This got committed to 0.90.6.
Sorry for not resolving it at that time.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 
 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-14 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5568:
--

Fix Version/s: 0.90.7

Updated 0.90.7 also in fix versions.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner

2012-03-12 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5520:
--

Attachment: HBASE-5520_3.patch

 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, 
 HBASE-5520_3.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner

2012-03-09 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5520:
--

Attachment: HBASE-5520_2.patch

Updated the patch with requestSeek.

 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner

2012-03-09 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5520:
--

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5516) GZip leading to memory leak in 0.90. Fix similar to HBASE-5387 needed for 0.90.

2012-03-08 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5516:
--

Attachment: HBASE-5516_3_0.90.patch

Updated patch addressing comments.

 GZip leading to memory leak in 0.90.  Fix similar to HBASE-5387 needed for 
 0.90.
 

 Key: HBASE-5516
 URL: https://issues.apache.org/jira/browse/HBASE-5516
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: HBASE-5516_2_0.90.patch, HBASE-5516_3_0.90.patch


 Usage of GZip is leading to resident memory leak in 0.90.
 We need to have something similar to HBASE-5387 in 0.90. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-03-08 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5545:
--

Fix Version/s: 0.92.2

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: gaojinchao
 Fix For: 0.90.7, 0.92.2


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Updated] (HBASE-5520) Support seek() reseek() at RegionScanner

2012-03-08 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5520:
--

Attachment: HBASE-5520_1.patch

Patch for trunk.

 Support seek() reseek() at RegionScanner
 

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
 Attachments: HBASE-5520_1.patch


 seek() reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5531:
--

   Resolution: Fixed
Fix Version/s: 0.94.0
   Status: Resolved  (was: Patch Available)

 Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
 -

 Key: HBASE-5531
 URL: https://issues.apache.org/jira/browse/HBASE-5531
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.2
Reporter: Laxman
  Labels: build
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch


 Current profile is still pointing to 0.23.1-SNAPSHOT. 
 This is failing to build as 23.1 is already released and snapshot is not 
 available anymore.
 We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.

2012-03-01 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5482:
--

Attachment: HBASE-5482_2.patch

Patch addressing Ted's comments.  
@Ted
If the patch is fine, we can commit it for 0.90.7?

 In 0.90, balancer algo leading to same region balanced twice and picking same 
 region with Src and Destination as same RS.
 -

 Key: HBASE-5482
 URL: https://issues.apache.org/jira/browse/HBASE-5482
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: 5482-v2.txt, HBASE-5482_1.patch, HBASE-5482_2.patch


 There are possibility of 2 problems
 - When we populate regionsToMove while iterating the serverinfo in 
 descending manner there is a chance that the same region can be added twice.
 Because in the first loop we do a randomization of the regions.
 Where as when we get we have neededRegions!= 0 we just get the region in the 
 index and add it again . This may lead to have same region in the 
 regionsToMove list.
 - Another problem is 
 when the problem in the first point happens then there is a chance that
 the regionToMove can have the same src and destination and the same region 
 can be picked every 5 mins.
 {code}
 for(Map.EntryHServerInfo, ListHRegionInfo server :
 serversByLoad.descendingMap().entrySet()) {
 BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey());
 int idx =
   balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload();
 if (idx = server.getValue().size()) break;
 HRegionInfo region = server.getValue().get(idx);
 if (region.isMetaRegion()) continue; // Don't move meta regions.
 regionsToMove.add(new RegionPlan(region, server.getKey(), null));
 if(--neededRegions == 0) {
   // No more regions needed, done shedding
   break;
 }
   }
 {code}
 If i have meta and root in the top two loaded region server(totally 3 RS), we 
 just skip the regions in those region server and populate the region from the 
 least loaded RS.
 Then in the next loop we iterate from the least loaded server and populate 
 the destination as also the same server.
 This is leading to a condition where every 5 min balancing happens and also 
 the server is same for src and dest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler

2012-02-29 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5490:
--

Attachment: HBASE-5490.patch

Patch for 0.90

 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 
 EventHandler
 

 Key: HBASE-5490
 URL: https://issues.apache.org/jira/browse/HBASE-5490
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5490.patch


 The new state that was added  RS_ZK_REGION_FAILED_OPEN was failing the 
 rolling restart.
 So move the new enum to the end of the list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler

2012-02-29 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5490:
--

Fix Version/s: 0.90.6

 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 
 EventHandler
 

 Key: HBASE-5490
 URL: https://issues.apache.org/jira/browse/HBASE-5490
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5490.patch


 The new state that was added  RS_ZK_REGION_FAILED_OPEN was failing the 
 rolling restart.
 So move the new enum to the end of the list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.

2012-02-28 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5482:
--

Attachment: HBASE-5482_1.patch

Patch for 0.90.

 In 0.90, balancer algo leading to same region balanced twice and picking same 
 region with Src and Destination as same RS.
 -

 Key: HBASE-5482
 URL: https://issues.apache.org/jira/browse/HBASE-5482
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: HBASE-5482_1.patch


 There are possibility of 2 problems
 - When we populate regionsToMove while iterating the serverinfo in 
 descending manner there is a chance that the same region can be added twice.
 Because in the first loop we do a randomization of the regions.
 Where as when we get we have neededRegions!= 0 we just get the region in the 
 index and add it again . This may lead to have same region in the 
 regionsToMove list.
 - Another problem is 
 when the problem in the first point happens then there is a chance that
 the regionToMove can have the same src and destination and the same region 
 can be picked every 5 mins.
 {code}
 for(Map.EntryHServerInfo, ListHRegionInfo server :
 serversByLoad.descendingMap().entrySet()) {
 BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey());
 int idx =
   balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload();
 if (idx = server.getValue().size()) break;
 HRegionInfo region = server.getValue().get(idx);
 if (region.isMetaRegion()) continue; // Don't move meta regions.
 regionsToMove.add(new RegionPlan(region, server.getKey(), null));
 if(--neededRegions == 0) {
   // No more regions needed, done shedding
   break;
 }
   }
 {code}
 If i have meta and root in the top two loaded region server(totally 3 RS), we 
 just skip the regions in those region server and populate the region from the 
 least loaded RS.
 Then in the next loop we iterate from the least loaded server and populate 
 the destination as also the same server.
 This is leading to a condition where every 5 min balancing happens and also 
 the server is same for src and dest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.

2012-02-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5482:
--

Summary: In 0.90, balancer algo leading to same region balanced twice and 
picking same region with Src and Destination as same RS.  (was: Balancer in 
0.90 algo leading to same region balanced twice and picking same region with 
Src and Destination as same RS.)

 In 0.90, balancer algo leading to same region balanced twice and picking same 
 region with Src and Destination as same RS.
 -

 Key: HBASE-5482
 URL: https://issues.apache.org/jira/browse/HBASE-5482
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7


 There are possibility of 2 problems
 - When we populate regionsToMove while iterating the serverinfo in 
 descending manner there is a chance that the same region can be added twice.
 Because in the first loop we do a randomization of the regions.
 Where as when we get we have neededRegions!= 0 we just get the region in the 
 index and add it again . This may lead to have same region in the 
 regionsToMove list.
 - Another problem is 
 when the problem in the first point happens then there is a chance that
 the regionToMove can have the same src and destination and the same region 
 can be picked every 5 mins.
 {code}
 for(Map.EntryHServerInfo, ListHRegionInfo server :
 serversByLoad.descendingMap().entrySet()) {
 BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey());
 int idx =
   balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload();
 if (idx = server.getValue().size()) break;
 HRegionInfo region = server.getValue().get(idx);
 if (region.isMetaRegion()) continue; // Don't move meta regions.
 regionsToMove.add(new RegionPlan(region, server.getKey(), null));
 if(--neededRegions == 0) {
   // No more regions needed, done shedding
   break;
 }
   }
 {code}
 If i have meta and root in the top two loaded region server(totally 3 RS), we 
 just skip the regions in those region server and populate the region from the 
 least loaded RS.
 Then in the next loop we iterate from the least loaded server and populate 
 the destination as also the same server.
 This is leading to a condition where every 5 min balancing happens and also 
 the server is same for src and dest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Attachment: HBASE-5200_trunk_latest_with_test_2.patch

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Status: Open  (was: Patch Available)

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent

2012-02-14 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Status: Patch Available  (was: Open)

0.90 will submit tomorrow.

 AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
 region assignment inconsistent
 -

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
 HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
 hbase-5200_90_latest.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-06 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5323:
--

Attachment: HBASE-5323.patch

Patch for 0.90

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7

 Attachments: HBASE-5323.patch, HBASE-5323.patch


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-03 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5323:
--

Attachment: HBASE-5323.patch

Patch for 0.90.

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7

 Attachments: HBASE-5323.patch


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.

2012-02-02 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5321:
--

Affects Version/s: 0.90.5
Fix Version/s: 0.90.6

 this.allRegionServersOffline  not set to false after one RS comes online and 
 assignment is done in 0.90.
 

 Key: HBASE-5321
 URL: https://issues.apache.org/jira/browse/HBASE-5321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6


 In HBASE-5160 we do not wait for TM to assign the regions after the first RS 
 comes online.
 After doing this the variable this.allRegionServersOffline needs to be reset 
 which is not done in 0.90.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.

2012-02-02 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5321:
--

Attachment: HBASE-5321.patch

Please review the patch.

 this.allRegionServersOffline  not set to false after one RS comes online and 
 assignment is done in 0.90.
 

 Key: HBASE-5321
 URL: https://issues.apache.org/jira/browse/HBASE-5321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6

 Attachments: HBASE-5321.patch


 In HBASE-5160 we do not wait for TM to assign the regions after the first RS 
 comes online.
 After doing this the variable this.allRegionServersOffline needs to be reset 
 which is not done in 0.90.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-02 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5323:
--

Summary: Need to handle assertion error while splitting log through 
ServerShutDownHandler by shutting down the master  (was: Need to handle 
assertion error if split log through ServerShutDownHandler by shutting down the 
master)

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5299) CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite wait in region assignment flow.

2012-02-02 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5299:
--

Summary: CatalogTracker.getMetaServerConnection() checks for root server 
connection and makes waitForMeta to go into infinite wait in region assignment 
flow.  (was: CatalogTracker.getMetaServerConnection() checks for root server 
connection and makes waitForMeta to go into infinite loop in region assignment 
flow.)

 CatalogTracker.getMetaServerConnection() checks for root server connection 
 and makes waitForMeta to go into infinite wait in region assignment flow.
 

 Key: HBASE-5299
 URL: https://issues.apache.org/jira/browse/HBASE-5299
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor

 RSA, RS B and RS C are 3 region servers.
 RS A - META
 RS B - ROOT
 RS C - NON META and NON ROOT
 Kill RS B and wait for server shutdown handler to start.  
 Start RS B again before assigning ROOT to RS C.
 Now the cluster will try to assign new regions to RS B.  
 But as ROOT is not yet assigned the OpenRegionHandler.updateMeta will fail to 
 update the regions just because ROOT is not online.
 {code}
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:25,126 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:25,159 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:35,385 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:23:35,449 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:16,666 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Attempting to transition node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:16,701 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:60020-0x1352e27539c0009 Successfully transitioned node 
 a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to 
 RS_ZK_REGION_OPENING
 2012-01-30 16:24:20,788 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting 
 thread Thread[PostOpenDeployTasks:a87109263ed53e67158377a149c5a7be,5,main]
 2012-01-30 16:24:30,699 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception 
 running postOpenDeployTasks; region=a87109263ed53e67158377a149c5a7be
 org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted
   at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:439)
   at 
 org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1382)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:221)
 {code}
 So we need to wait for TM to assign the regions again. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.

2012-01-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Status: Patch Available  (was: Open)

 AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the 
 region assignment inconsistent.
 ---

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.

2012-01-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Status: Open  (was: Patch Available)

 AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the 
 region assignment inconsistent.
 ---

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.

2012-01-31 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Attachment: HBASE-5200_1.patch

Updated patch.  Test case passes with this.

 AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the 
 region assignment inconsistent.
 ---

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, 
 TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.

2012-01-30 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Status: Patch Available  (was: Open)

 AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the 
 region assignment inconsistent.
 ---

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5200.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.

2012-01-30 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5200:
--

Attachment: HBASE-5200.patch

Patch for trunk.


 AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the 
 region assignment inconsistent.
 ---

 Key: HBASE-5200
 URL: https://issues.apache.org/jira/browse/HBASE-5200
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.1

 Attachments: HBASE-5200.patch


 This is the scenario
 Consider a case where the balancer is going on thus trying to close regions 
 in a RS.
 Before we could close a master switch happens.  
 On Master switch the set of nodes that are in RIT is collected and we first 
 get Data and start watching the node
 After that the node data is added into RIT.
 Now by this time (before adding to RIT) if the RS to which close was called 
 does a transition in AM.handleRegion() we miss the handling saying RIT state 
 was null.
 {code}
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 a66d281d231dfcaea97c270698b26b6f from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 c12e53bfd48ddc5eec507d66821c4d23 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,358 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 59ae13de8c1eb325a0dd51f4902d2052 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 f45bc9614d7575f35244849af85aa078 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 cc3ecd7054fe6cd4a1159ed92fd62641 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 3af40478a17fee96b4a192b22c90d5a2 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 e6096a8466e730463e10d3d61f809b92 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 4806781a1a23066f7baed22b4d237e24 from server 
 HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 2012-01-13 10:50:46,359 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
 d69e104131accaefe21dcc01fddc7629 from server 
 HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
 not in expected PENDING_CLOSE or CLOSING states
 {code}
 In branch the CLOSING node is created by RS thus leading to more 
 inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-27 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5153:
--

Attachment: HBASE-5153_addendum_0.90_1.patch

@Lars
Addressing your comments.
Based on your feedback will commit the patch if it is ok.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 
 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4654:
--

Fix Version/s: (was: 0.90.6)
   0.92.1
   0.90.7

Moving to 0.90.7 and 0.92.1.. Please pull back if you think differently.

 [replication] Add a check to make sure we don't replicate to ourselves
 --

 Key: HBASE-4654
 URL: https://issues.apache.org/jira/browse/HBASE-4654
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.7, 0.92.1

 Attachments: 4654-trunk.txt


 It's currently possible to add a peer for replication and point it to the 
 local cluster, which I believe could very well happen for those like us that 
 use only one ZK ensemble per DC so that only the root znode changes when you 
 want to set up replication intra-DC.
 I don't think comparing just the cluster ID would be enough because you would 
 normally use a different one for another cluster and nothing will block you 
 from pointing elsewhere.
 Comparing the ZK ensemble address doesn't work either when you have multiple 
 DNS entries that point at the same place.
 I think this could be resolved by looking up the master address in the 
 relevant znode as it should be exactly the same thing in the case where you 
 have the same cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4762) ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4762:
--

Fix Version/s: (was: 0.90.6)
   0.90.7
 Hadoop Flags: Reviewed

Moving to 0.90.7

 ROOT and META region never be assigned if IOE throws in 
 verifyRootRegionLocation
 

 Key: HBASE-4762
 URL: https://issues.apache.org/jira/browse/HBASE-4762
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: mingjian
Assignee: mingjian
 Fix For: 0.90.7


 Patch in HBASE-3914 fixed root assigned in two regionservers. But it seemed 
 like root region will never be assigned if verifyRootRegionLocation throws 
 IOE.
 Like following master logs:
 {noformat}
 2011-10-19 19:13:34,873 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_META_SERVER_S
 HUTDOWN
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
 yet
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1090)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:256)
 at $Proxy7.getRegionInfo(Unknown Source)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 After this, -ROOT-'s region won't be assigned, like this:
 {noformat}
 2011-10-19 19:18:40,000 DEBUG 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 locateRegionInMeta parent
 Table=-ROOT-, metaLocation=address: dw79.kgb.sqa.cm4:60020, regioninfo: 
 -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after s
 leep of 1000 because: org.apache.hadoop.hbase.NotServingRegionException: 
 Region is not online: -ROOT-,,0
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2771)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1802)
 at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:569)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1091)
 {noformat}
 So we should rewrite the verifyRootRegionLocation method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5004) Better manage standalone setups on Ubuntu, the 127.0.1.1 issue

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5004:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving to 0.90.7

 Better manage standalone setups on Ubuntu, the 127.0.1.1 issue
 --

 Key: HBASE-5004
 URL: https://issues.apache.org/jira/browse/HBASE-5004
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.90.7


 Numerous times users have come with issues setting up HBase on Ubuntu because 
 it has the 127.0.1.1 line messing everything. Here's an example:
 {quote}
 2011-12-10 00:18:24,312 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as 
 localhost,33371,1323476299775, RPC listening on /127.0.1.1:33371, 
 sessionid=0x1342555adc90002
 ...
 2011-12-10 00:18:27,135 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 -ROOT-,,0.70236052 to localhost,33371,1323476299775
 2011-12-10 00:18:27,135 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
 New connection to localhost,33371,1323476299775
 2011-12-10 00:18:27,155 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 
 /127.0.0.1:33371 could not be reached after 1 tries, giving up.
 2011-12-10 00:18:27,156 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 -ROOT-,,0.70236052 to serverName=localhost,33371,1323476299775, 
 load=(requests=0, regions=0, usedHeap=23, maxHeap=983), trying to assign 
 elsewhere instead; retry=0
 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up 
 proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to 
 /127.0.0.1:33371 after attempts=1
 {quote}
 We should have a special check in standalone mode to make sure we won't fall 
 into that trap and then print a useful error message that would hopefully 
 appear on the command line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4462) Properly treating SocketTimeoutException

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4462:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

 Properly treating SocketTimeoutException
 

 Key: HBASE-4462
 URL: https://issues.apache.org/jira/browse/HBASE-4462
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: HBASE-4462_0.90.x.patch


 SocketTimeoutException is currently treated like any IOE inside of 
 HCM.getRegionServerWithRetries and I think this is a problem. This method 
 should only do retries in cases where we are pretty sure the operation will 
 complete, but with STE we already waited for (by default) 60 seconds and 
 nothing happened.
 I found this while debugging Douglas Campbell's problem on the mailing list 
 where it seemed like he was using the same scanner from multiple threads, but 
 actually it was just the same client doing retries while the first run didn't 
 even finish yet (that's another problem). You could see the first scanner, 
 then up to two other handlers waiting for it to finish in order to run 
 (because of the synchronization on RegionScanner).
 So what should we do? We could treat STE as a DoNotRetryException and let the 
 client deal with it, or we could retry only once.
 There's also the option of having a different behavior for get/put/icv/scan, 
 the issue with operations that modify a cell is that you don't know if the 
 operation completed or not (same when a RS dies hard after completing let's 
 say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4064) Two concurrent unassigning of the same region caused the endless loop of Region has been PENDING_CLOSE for too long...

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4064:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

 Two concurrent unassigning of the same region caused the endless loop of 
 Region has been PENDING_CLOSE for too long...
 

 Key: HBASE-4064
 URL: https://issues.apache.org/jira/browse/HBASE-4064
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.7

 Attachments: HBASE-4064-v1.patch, HBASE-4064_branch90V2.patch, 
 disableflow.png


 1. If there is a rubbish RegionState object with PENDING_CLOSE in 
 regionsInTransition(The RegionState was remained by some exception which 
 should be removed, that's why I called it as rubbish object), but the 
 region is not currently assigned anywhere, TimeoutMonitor will fall into an 
 endless loop:
 2011-06-27 10:32:21,326 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 state=PENDING_CLOSE, ts=1309141555301
 2011-06-27 10:32:21,326 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
 2011-06-27 10:32:21,438 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 (offlining)
 2011-06-27 10:32:21,441 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is 
 not currently assigned anywhere
 2011-06-27 10:32:31,207 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 state=PENDING_CLOSE, ts=1309141555301
 2011-06-27 10:32:31,207 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
 2011-06-27 10:32:31,215 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 (offlining)
 2011-06-27 10:32:31,215 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is 
 not currently assigned anywhere
 2011-06-27 10:32:41,164 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 state=PENDING_CLOSE, ts=1309141555301
 2011-06-27 10:32:41,164 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f.
 2011-06-27 10:32:41,172 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 
 (offlining)
 2011-06-27 10:32:41,172 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign 
 region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is 
 not currently assigned anywhere
 .
 2  In the following scenario, two concurrent unassigning call of the same 
 region may lead to the above problem:
 the first unassign call send rpc call success, the master watched the event 
 of RS_ZK_REGION_CLOSED, process this event, will create a 
 ClosedRegionHandler to remove the state of the region in master.eg.
 while ClosedRegionHandler is running in  
 hbase.master.executor.closeregion.threads thread (A), another unassign call 
 of same region run in another thread(B).
 while thread B  run if (!regions.containsKey(region)), this.regions have 
 the region info, now  cpu switch to thread A.
 The thread A will remove the region from the sets of this.regions and 
 regionsInTransition, then switch to thread B. the thread B run continue, 
 will throw an exception with the msg of Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 9a6e26d40293663a79523c58315b930f, but without removing the new-adding 
 RegionState from regionsInTransition,and it can not be removed for ever.
  public void unassign(HRegionInfo region, boolean force) {
 LOG.debug(Starting unassignment of region  +
   region.getRegionNameAsString() +  (offlining));
 

[jira] [Updated] (HBASE-4094) improve hbck tool to fix more hbase problem

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4094:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving to 0.90.7.  HBASE-5128 also is related to improving hbck tool.

 improve hbck tool to fix more hbase problem
 ---

 Key: HBASE-4094
 URL: https://issues.apache.org/jira/browse/HBASE-4094
 Project: HBase
  Issue Type: New Feature
  Components: master
Affects Versions: 0.90.3
Reporter: feng xu
 Fix For: 0.90.7

 Attachments: HbaseFsck_TableChain.patch

   Original Estimate: 12h
  Remaining Estimate: 12h



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4083) If Enable table is not completed and is partial, then scanning of the table is not working

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4083:
--

Fix Version/s: (was: 0.90.6)
   0.92.0
   0.90.7

Not fixed in 0.90.  Hence not resolving the issue.  But committed in trunk and 
0.92


 If Enable table is not completed and is partial, then scanning of the table 
 is not working 
 ---

 Key: HBASE-4083
 URL: https://issues.apache.org/jira/browse/HBASE-4083
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.0

 Attachments: HBASE-4083-1.patch, HBASE-4083_0.90.patch, 
 HBASE-4083_0.90_1.patch, HBASE-4083_trunk.patch, HBASE-4083_trunk_1.patch


 Consider the following scenario
 Start the Master, Backup master and RegionServer.
 Create a table which in turn creates a region.
 Disable the table.
 Enable the table again. 
 Kill the Active master exactly at the point before the actual region 
 assignment is started.
 Restart or switch master.
 Scan the table.
 NotServingRegionExcepiton is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5197) [replication] Handle socket timeouts in ReplicationSource to prevent DDOS

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5197:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Updating affect versions to 0.90.7

 [replication] Handle socket timeouts in ReplicationSource to prevent DDOS
 -

 Key: HBASE-5197
 URL: https://issues.apache.org/jira/browse/HBASE-5197
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.90.7, 0.92.1


 Kind of like HBASE-4462 but for replication. If while replicating you get a 
 socket timeout, the last thing you want to do is to retry it right away. 
 Since we can't fail the replication thread, the best I can think of is to 
 sleep a really long amount of time.
 Planning to bring this to all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3917) Separate the Avro schema definition file from the code

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-3917:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

 Separate the Avro schema definition file from the code
 --

 Key: HBASE-3917
 URL: https://issues.apache.org/jira/browse/HBASE-3917
 Project: HBase
  Issue Type: Improvement
  Components: avro
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Alex Newman
Priority: Trivial
  Labels: noob
 Fix For: 0.90.7

 Attachments: 
 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch


 The Avro schema files are in the src/main/java path, but should be in 
 /src/main/resources just like the Hbase.thrift is. Makes the separation the 
 same and cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5157) Backport HBASE-4880- Region is on service before openRegionHandler completes, may cause data loss

2012-01-26 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5157:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving to 0.90.7.  Needs some more code rewrite to make this fit in 0.90.

 Backport HBASE-4880- Region is on service before openRegionHandler completes, 
 may cause data loss
 -

 Key: HBASE-5157
 URL: https://issues.apache.org/jira/browse/HBASE-5157
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: HBASE-4880_branch90_1.patch


 Backporting to 0.90.6 considering the importance of the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5179:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Updating the affected version to 0.90.7.  This issue will go into 0.90.7 after 
sufficient testing.

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 
 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 
 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 
 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
 Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
 hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
 hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5271) Result.getValue and Result.getColumnLatest return the wrong column.

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5271:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Updated affects version to 0.90.7

 Result.getValue and Result.getColumnLatest return the wrong column.
 ---

 Key: HBASE-5271
 URL: https://issues.apache.org/jira/browse/HBASE-5271
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.5
Reporter: Ghais Issa
Assignee: Ghais Issa
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: 5271-90.txt, 5271-v2.txt, 
 fixKeyValueMatchingColumn.diff, testGetValue.diff


 In the following example result.getValue returns the wrong column
 KeyValue kv = new KeyValue(Bytes.toBytes(r), Bytes.toBytes(24), 
 Bytes.toBytes(2), Bytes.toBytes(7L));
 Result result = new Result(new KeyValue[] { kv });
 System.out.println(Bytes.toLong(result.getValue(Bytes.toBytes(2), 
 Bytes.toBytes(2; //prints 7.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5243) LogSyncerThread not getting shutdown waiting for the interrupted flag

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5243:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.92, trunk and 0.90

 LogSyncerThread not getting shutdown waiting for the interrupted flag
 -

 Key: HBASE-5243
 URL: https://issues.apache.org/jira/browse/HBASE-5243
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6, 0.92.1

 Attachments: 5243-92.addendum, HBASE-5243_0.90.patch, 
 HBASE-5243_0.90_1.patch, HBASE-5243_trunk.patch


 In the LogSyncer run() we keep looping till this.isInterrupted flag is set.
 But in some cases the DFSclient is consuming the Interrupted exception.  So
 we are running into infinite loop in some shutdown cases.
 I would suggest that as we are the ones who tries to close down the
 LogSyncerThread we can introduce a variable like
 Close or shutdown and based on the state of this flag along with
 isInterrupted() we can make the thread stop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5237:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.90, trunk and 0.92

 Addendum for HBASE-5160 and HBASE-4397
 --

 Key: HBASE-5237
 URL: https://issues.apache.org/jira/browse/HBASE-5237
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.6, 0.92.1

 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch


 As part of HBASE-4397 there is one more scenario where the patch has to be 
 applied.
 {code}
 RegionPlan plan = getRegionPlan(state, forceNewPlan);
   if (plan == null) {
 debugLog(state.getRegion(),
 Unable to determine a plan to assign  + state);
 return; // Should get reassigned later when RIT times out.
   }
 {code}
 I think in this scenario also 
 {code}
 this.timeoutMonitor.setAllRegionServersOffline(true);
 {code}
 this should be done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4951:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Updated the affected version to 0.90.7.

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.7

 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3855) Performance degradation of memstore because reseek is linear

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-3855:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Not fixed in this 0.90.6.  Hence moving it to 0.90.7.

 Performance degradation of memstore because reseek is linear
 

 Key: HBASE-3855
 URL: https://issues.apache.org/jira/browse/HBASE-3855
 Project: HBase
  Issue Type: Improvement
Reporter: dhruba borthakur
Priority: Blocker
 Fix For: 0.90.7

 Attachments: memstoreReseek.txt, memstoreReseek2.txt


 The scanner use reseek to find the next row (or next column) as part of a 
 scan. The reseek code iterates over a Set to position itself at the right 
 place. If there are many thousands of kvs that need to be skipped over, then 
 the time-cost is very high. In this case, a seek would be far lesser in cost 
 than a reseek.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5003:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Updated affect version to 0.90.7

 If the master is started with a wrong root dir, it gets stuck and can't be 
 killed
 -

 Key: HBASE-5003
 URL: https://issues.apache.org/jira/browse/HBASE-5003
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
  Labels: noob
 Fix For: 0.94.0, 0.90.7, 0.92.1


 Reported by a new user on IRC who tried to set hbase.rootdir to 
 file:///~/hbase, the master gets stuck and cannot be killed. I tried 
 something similar on my machine and it spins while logging:
 {quote}
 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to 
 create version file at file:/bin/hbase, retrying: Mkdirs failed to create 
 file:/bin/hbase
 {quote}
 The reason it cannot be stopped is that the master's main thread is stuck in 
 there and will never be notified:
 {quote}
 Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 
 nid=0x1137ba000 waiting on condition [1137b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297)
   at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
   at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218)
   at java.lang.Thread.run(Thread.java:680)
 {quote}
 It seems we should do a better handling of the exceptions we get in there, 
 and die if we need to. It would make a better user experience.
 Maybe also do a check on hbase.rootdir before even starting the master.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3834) Store ignores checksum errors when opening files

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-3834:
--


Moving to 0.90.7

 Store ignores checksum errors when opening files
 

 Key: HBASE-3834
 URL: https://issues.apache.org/jira/browse/HBASE-3834
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.2
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.90.6


 If you corrupt one of the storefiles in a region (eg using vim to muck up 
 some bytes), the region will still open, but that storefile will just be 
 ignored with a log message. We should probably not do this in general - 
 better to keep that region unassigned and force an admin to make a decision 
 to remove the bad storefile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4470:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving into 0.90.7

 ServerNotRunningException coming out of assignRootAndMeta kills the Master
 --

 Key: HBASE-4470
 URL: https://issues.apache.org/jira/browse/HBASE-4470
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.7


 I'm surprised we still have issues like that and I didn't get a hit while 
 googling so forgive me if there's already a jira about it.
 When the master starts it verifies the locations of root and meta before 
 assigning them, if the server is started but not running you'll get this:
 {quote}
 2011-09-23 04:47:44,859 WARN 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: 
 RemoteException connecting to RS
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
 yet
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
 at $Proxy6.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287)
 at 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484)
 at 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
 {quote}
 I hit that 3-4 times this week while debugging something else. The worst is 
 that when you restart the master it sees that as a failover, but none of the 
 regions are assigned so it takes an eternity to get back fully online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4298:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving into 0.90.7

 Support to drain RS nodes through ZK
 

 Key: HBASE-4298
 URL: https://issues.apache.org/jira/browse/HBASE-4298
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
 Environment: all
Reporter: Aravind Gottipati
Priority: Critical
  Labels: patch
 Fix For: 0.90.7

 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, 
 drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, 
 trunk_with_test.txt


 HDFS currently has a way to exclude certain datanodes and prevent them from 
 getting new blocks.  HDFS goes one step further and even drains these nodes 
 for you.  This enhancement is a step in that direction.
 The idea is that we mark nodes in zookeeper as draining nodes.  This means 
 that they don't get any more new regions.  These draining nodes look exactly 
 the same as the corresponding nodes in /rs, except they live under /draining.
 Eventually, support for draining them can be added.  I am submitting two 
 patches for review - one for the 0.90 branch and one for trunk (in git).
 Here are the two patches
 0.90 - 
 https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
 trunk - 
 https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
 I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4288) Server not running exception during meta verification causes RS abort

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4288:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

Moving into 0.90.7

 Server not running exception during meta verification causes RS abort
 ---

 Key: HBASE-4288
 URL: https://issues.apache.org/jira/browse/HBASE-4288
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.90.7

 Attachments: 4288-v2.txt, 4288.txt


 The master tried to verify the META location just as that server was shutting 
 down due to an abort. This caused the Server not running exception to get 
 thrown, which wasn't handled properly in the master, causing it to abort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4298:
--

Fix Version/s: 0.92.0

@Stack
This issue has gone into 0.92 and trunk. As it is a new feature do you want to 
go into future 0.90 releases? if not can remove the fix versions as 0.90?

 Support to drain RS nodes through ZK
 

 Key: HBASE-4298
 URL: https://issues.apache.org/jira/browse/HBASE-4298
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.4
 Environment: all
Reporter: Aravind Gottipati
Priority: Critical
  Labels: patch
 Fix For: 0.90.7, 0.92.0

 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, 
 drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, 
 trunk_with_test.txt


 HDFS currently has a way to exclude certain datanodes and prevent them from 
 getting new blocks.  HDFS goes one step further and even drains these nodes 
 for you.  This enhancement is a step in that direction.
 The idea is that we mark nodes in zookeeper as draining nodes.  This means 
 that they don't get any more new regions.  These draining nodes look exactly 
 the same as the corresponding nodes in /rs, except they live under /draining.
 Eventually, support for draining them can be added.  I am submitting two 
 patches for review - one for the 0.90 branch and one for trunk (in git).
 Here are the two patches
 0.90 - 
 https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
 trunk - 
 https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
 I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4288) Server not running exception during meta verification causes RS abort

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4288:
--

Fix Version/s: 0.92.0

Already committed to 0.92 and trunk  But not into 0.90.  Hence not resolving 
just updating fix version.

 Server not running exception during meta verification causes RS abort
 ---

 Key: HBASE-4288
 URL: https://issues.apache.org/jira/browse/HBASE-4288
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.90.7, 0.92.0

 Attachments: 4288-v2.txt, 4288.txt


 The master tried to verify the META location just as that server was shutting 
 down due to an abort. This caused the Server not running exception to get 
 thrown, which wasn't handled properly in the master, causing it to abort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4550:
--

Fix Version/s: (was: 0.90.6)
   0.90.7

@Wanbin
If you provide a new patch can be integrated to 0.90.7
Moving to 0.90.7

 When master passed regionserver different address , because regionserver 
 didn't create new zookeeper znode,  as  a result stop-hbase.sh is hang
 ---

 Key: HBASE-4550
 URL: https://issues.apache.org/jira/browse/HBASE-4550
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.3
Reporter: wanbin
Assignee: wanbin
 Fix For: 0.90.7

 Attachments: hbase-0.90.3.patch, patch, patch.txt

   Original Estimate: 2h
  Remaining Estimate: 2h

 when master passed regionserver different address, regionserver didn't create 
 new zookeeper znode, master store new address in ServerManager, when call 
 stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, 
 serverManager.expireServer is not be called. so stop-hbase.sh is hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null

2012-01-25 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4848:
--

Fix Version/s: (was: 0.90.6)
   0.90.5
   0.92.0

Already committed and resolved issue.  Hence closing the issue as resolved.

 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: stack
Assignee: stack
 Fix For: 0.90.5, 0.92.0

 Attachments: 4848-092.txt, 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.

2012-01-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5235:
--

Attachment: HBASE-5235_0.90_2.patch

Updated patch addressing Ted's comments for 0.90. Already trunk patch 
incorporates Ted's comments.

 HLogSplitter writer thread's streams not getting closed when any of the 
 writer threads has exceptions.
 --

 Key: HBASE-5235
 URL: https://issues.apache.org/jira/browse/HBASE-5235
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5235_0.90.patch, HBASE-5235_0.90_1.patch, 
 HBASE-5235_0.90_2.patch, HBASE-5235_trunk.patch


 Pls find the analysis.  Correct me if am wrong
 {code}
 2012-01-15 05:14:02,374 FATAL 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got 
 while writing log entry to log
 java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting...
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026)
 {code}
 Here we have an exception in one of the writer threads. If any exception we 
 try to hold it in an Atomic variable 
 {code}
   private void writerThreadError(Throwable t) {
 thrown.compareAndSet(null, t);
   }
 {code}
 In the finally block of splitLog we try to close the streams.
 {code}
   for (WriterThread t: writerThreads) {
 try {
   t.join();
 } catch (InterruptedException ie) {
   throw new IOException(ie);
 }
 checkForErrors();
   }
   LOG.info(Split writers finished);
   
   return closeStreams();
 {code}
 Inside checkForErrors
 {code}
   private void checkForErrors() throws IOException {
 Throwable thrown = this.thrown.get();
 if (thrown == null) return;
 if (thrown instanceof IOException) {
   throw (IOException)thrown;
 } else {
   throw new RuntimeException(thrown);
 }
   }
 So once we throw the exception the DFSStreamer threads are not getting closed.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.

2012-01-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5235:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.90 and trunk.  
Thanks for the review Ted.

 HLogSplitter writer thread's streams not getting closed when any of the 
 writer threads has exceptions.
 --

 Key: HBASE-5235
 URL: https://issues.apache.org/jira/browse/HBASE-5235
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5235_0.90.patch, HBASE-5235_0.90_1.patch, 
 HBASE-5235_0.90_2.patch, HBASE-5235_trunk.patch


 Pls find the analysis.  Correct me if am wrong
 {code}
 2012-01-15 05:14:02,374 FATAL 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got 
 while writing log entry to log
 java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting...
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026)
 {code}
 Here we have an exception in one of the writer threads. If any exception we 
 try to hold it in an Atomic variable 
 {code}
   private void writerThreadError(Throwable t) {
 thrown.compareAndSet(null, t);
   }
 {code}
 In the finally block of splitLog we try to close the streams.
 {code}
   for (WriterThread t: writerThreads) {
 try {
   t.join();
 } catch (InterruptedException ie) {
   throw new IOException(ie);
 }
 checkForErrors();
   }
   LOG.info(Split writers finished);
   
   return closeStreams();
 {code}
 Inside checkForErrors
 {code}
   private void checkForErrors() throws IOException {
 Throwable thrown = this.thrown.get();
 if (thrown == null) return;
 if (thrown instanceof IOException) {
   throw (IOException)thrown;
 } else {
   throw new RuntimeException(thrown);
 }
   }
 So once we throw the exception the DFSStreamer threads are not getting closed.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.

2012-01-21 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5235:
--

Attachment: HBASE-5235_0.90.patch

Patch for 0.90.  If this patch is fine i will prepare a similar patch for 0.92

 HLogSplitter writer thread's streams not getting closed when any of the 
 writer threads has exceptions.
 --

 Key: HBASE-5235
 URL: https://issues.apache.org/jira/browse/HBASE-5235
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: HBASE-5235_0.90.patch


 Pls find the analysis.  Correct me if am wrong
 {code}
 2012-01-15 05:14:02,374 FATAL 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got 
 while writing log entry to log
 java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting...
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026)
 {code}
 Here we have an exception in one of the writer threads. If any exception we 
 try to hold it in an Atomic variable 
 {code}
   private void writerThreadError(Throwable t) {
 thrown.compareAndSet(null, t);
   }
 {code}
 In the finally block of splitLog we try to close the streams.
 {code}
   for (WriterThread t: writerThreads) {
 try {
   t.join();
 } catch (InterruptedException ie) {
   throw new IOException(ie);
 }
 checkForErrors();
   }
   LOG.info(Split writers finished);
   
   return closeStreams();
 {code}
 Inside checkForErrors
 {code}
   private void checkForErrors() throws IOException {
 Throwable thrown = this.thrown.get();
 if (thrown == null) return;
 if (thrown instanceof IOException) {
   throw (IOException)thrown;
 } else {
   throw new RuntimeException(thrown);
 }
   }
 So once we throw the exception the DFSStreamer threads are not getting closed.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   >