[jira] [Updated] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5840: -- Fix Version/s: 0.96.0 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status -- Key: HBASE-5840 URL: https://issues.apache.org/jira/browse/HBASE-5840 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Gopinathan A Fix For: 0.96.0, 0.94.1 TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will keeps showing old status. This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.
[ https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5809: -- Resolution: Fixed Status: Resolved (was: Patch Available) Avoid move api to take the destination server same as the source server. Key: HBASE-5809 URL: https://issues.apache.org/jira/browse/HBASE-5809 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Priority: Minor Labels: patch Fix For: 0.96.0 Attachments: HBASE-5809.patch In Move currently we take any destination specified and if the destination is same as the source we still do unassign and assign. Here we can have problems due to RegionAlreadyInTransitionException and thus hanging the region in RIT for long time. We can avoid this scenario by not allowing the move to happen in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.
[ https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5809: -- Fix Version/s: (was: 0.94.1) Issue Type: Improvement (was: Bug) Committed to trunk. Made it as an improvement. Thanks for the patch Rajesh. Thanks for the review Ted and Uma. Avoid move api to take the destination server same as the source server. Key: HBASE-5809 URL: https://issues.apache.org/jira/browse/HBASE-5809 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Priority: Minor Labels: patch Fix For: 0.96.0 Attachments: HBASE-5809.patch In Move currently we take any destination specified and if the destination is same as the source we still do unassign and assign. Here we can have problems due to RegionAlreadyInTransitionException and thus hanging the region in RIT for long time. We can avoid this scenario by not allowing the move to happen in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.
[ https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5809: -- Attachment: HBASE-5809.patch This is what i finally committed. Small formatting change and corrected typo error. Avoid move api to take the destination server same as the source server. Key: HBASE-5809 URL: https://issues.apache.org/jira/browse/HBASE-5809 Project: HBase Issue Type: Improvement Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Assignee: rajeshbabu Priority: Minor Labels: patch Fix For: 0.96.0 Attachments: HBASE-5809.patch, HBASE-5809.patch In Move currently we take any destination specified and if the destination is same as the source we still do unassign and assign. Here we can have problems due to RegionAlreadyInTransitionException and thus hanging the region in RIT for long time. We can avoid this scenario by not allowing the move to happen in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
[ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5635: -- Attachment: HBASE-5635._trunk.patch If getTaskList() returns null splitlogWorker is down. It wont serve any requests. -- Key: HBASE-5635 URL: https://issues.apache.org/jira/browse/HBASE-5635 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.1 Reporter: Kristam Subba Swathi Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, HBASE-5635._trunk.patch, HBASE-5635.patch During the hlog split operation if all the zookeepers are down ,then the paths will be returned as null and the splitworker thread wil be exited Now this regionserver wil not be able to acquire any other tasks since the splitworker thread is exited Please find the attached code for more details {code} private ListString getTaskList() { for (int i = 0; i zkretries; i++) { try { return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, this.watcher.splitLogZNode)); } catch (KeeperException e) { LOG.warn(Could not get children of znode + this.watcher.splitLogZNode, e); try { Thread.sleep(1000); } catch (InterruptedException e1) { LOG.warn(Interrupted while trying to get task list ..., e1); Thread.currentThread().interrupt(); return null; } } } {code} in the org.apache.hadoop.hbase.regionserver.SplitLogWorker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.
[ https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5635: -- Attachment: HBASE-5635_0.94.patch If getTaskList() returns null splitlogWorker is down. It wont serve any requests. -- Key: HBASE-5635 URL: https://issues.apache.org/jira/browse/HBASE-5635 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.1 Reporter: Kristam Subba Swathi Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, HBASE-5635._trunk.patch, HBASE-5635.patch, HBASE-5635_0.94.patch During the hlog split operation if all the zookeepers are down ,then the paths will be returned as null and the splitworker thread wil be exited Now this regionserver wil not be able to acquire any other tasks since the splitworker thread is exited Please find the attached code for more details {code} private ListString getTaskList() { for (int i = 0; i zkretries; i++) { try { return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher, this.watcher.splitLogZNode)); } catch (KeeperException e) { LOG.warn(Could not get children of znode + this.watcher.splitLogZNode, e); try { Thread.sleep(1000); } catch (InterruptedException e1) { LOG.warn(Interrupted while trying to get task list ..., e1); Thread.currentThread().interrupt(); return null; } } } {code} in the org.apache.hadoop.hbase.regionserver.SplitLogWorker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Fix Version/s: 0.94.1 Issue Type: Bug (was: Improvement) @Stack Yes i have a configured LB. But as we provide option to use master services in the LB and now if i try to use the 'balancer' object there, it is a new one. I am updating it to a bug Stack. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Bug Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5809) Avoid move api to take the destination server same as the source server.
[ https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5809: -- Affects Version/s: (was: 0.94.0) 0.92.1 Fix Version/s: 0.94.1 0.96.0 Will commit tomorrow unless no objection. Avoid move api to take the destination server same as the source server. Key: HBASE-5809 URL: https://issues.apache.org/jira/browse/HBASE-5809 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: ramkrishna.s.vasudevan Priority: Minor Labels: patch Fix For: 0.96.0, 0.94.1 Attachments: HBASE-5809.patch In Move currently we take any destination specified and if the destination is same as the source we still do unassign and assign. Here we can have problems due to RegionAlreadyInTransitionException and thus hanging the region in RIT for long time. We can avoid this scenario by not allowing the move to happen in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Fix Version/s: 0.96.0 Can this be an improvement or bug? I think the 'balancer' usage in HMaster and AM was a bug right? Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Fix For: 0.96.0 Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5545: -- Fix Version/s: 0.94.1 region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.7, 0.92.2, 0.94.1 Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void
[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5545: -- Status: Patch Available (was: Open) region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: HBASE-5545.patch Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void
[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5545: -- Attachment: HBASE-5545.patch Patch for 0.94. region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.7, 0.92.2, 0.94.1 Attachments: HBASE-5545.patch Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void
[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5545: -- Attachment: HBASE-5545.patch region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.7, 0.92.2, 0.94.0 Attachments: HBASE-5545.patch, HBASE-5545.patch Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void
[jira] [Updated] (HBASE-5782) Not all the regions are getting assigned after the log splitting.
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5782: -- Attachment: HBASE-5782.patch Not all the regions are getting assigned after the log splitting. - Key: HBASE-5782 URL: https://issues.apache.org/jira/browse/HBASE-5782 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Gopinathan A Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.94.0 Attachments: HBASE-5782.patch Create a table with 1000 splits, after the region assignemnt, kill the regionserver wich contains META table. Here few regions are missing after the log splitting and region assigment. HBCK report shows multiple region holes are got created. Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Attachment: HBASE-5737_3.patch Latest patch as per Stack's suggestion. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Open (was: Patch Available) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Patch Available (was: Open) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch, HBASE-5737_3.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Attachment: HBASE-5737_2.patch Updated patch. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Open (was: Patch Available) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Patch Available (was: Open) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch, HBASE-5737_2.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Summary: Minor Improvements related to balancer. (was: Use a treemap instead of hashmap in in AM.getAssignmentByTable used in balancer) There are few more changes that we can do here - Currently two different balancer objects are created in AssignmentManager and HMaster. Unify them - balancer.randomAssignment() is getting called once but based on whether the existingplan is null or if it is a force plan we either use the randomplan or not. Ideally if we are extending a new balancer then randomAssignment will be called but later the plan given by randomAssignment may not be used. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Patch Available (was: Open) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Open (was: Patch Available) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Attachment: HBASE-5737_1.patch Pls review the latest one. If you feel still TreeMap change is not needed, i think the other changes are needed. Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5737) Minor Improvements related to balancer.
[ https://issues.apache.org/jira/browse/HBASE-5737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5737: -- Status: Patch Available (was: Open) Minor Improvements related to balancer. --- Key: HBASE-5737 URL: https://issues.apache.org/jira/browse/HBASE-5737 Project: HBase Issue Type: Improvement Components: master Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor Attachments: HBASE-5737.patch, HBASE-5737_1.patch Currently in Am.getAssignmentByTable() we use a result map which is currenly a hashmap. It could be better if we have a treeMap. Even in MetaReader.fullScan we have the treeMap only so that we have the naming order maintained. I felt this change could be very useful in cases where we are extending the DefaultLoadBalancer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5689) Skipping RecoveredEdits may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5689: -- Priority: Critical (was: Major) Making it critical as it is data loss related. Skipping RecoveredEdits may cause data loss --- Key: HBASE-5689 URL: https://issues.apache.org/jira/browse/HBASE-5689 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Attachments: 5689-testcase.patch, HBASE-5689.patch Let's see the following scenario: 1.Region is on the server A 2.put KV(r1-v1) to the region 3.move region from server A to server B 4.put KV(r2-v2) to the region 5.move region from server B to server A 6.put KV(r3-v3) to the region 7.kill -9 server B and start it 8.kill -9 server A and start it 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third KV(r3-v3) is lost. Let's analyse the upper scenario from the code: 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same hlog file on server A. 2.when we split server B's hlog file in the process of ServerShutdownHandler, we create one RecoveredEdits file f1 for the region. 2.when we split server A's hlog file in the process of ServerShutdownHandler, we create another RecoveredEdits file f2 for the region. 3.however, RecoveredEdits file f2 will be skiped when initializing region HRegion#replayRecoveredEditsIfAny {code} for (Path edits: files) { if (edits == null || !this.fs.exists(edits)) { LOG.warn(Null or non-existent edits file: + edits); continue; } if (isZeroLengthThenDelete(this.fs, edits)) continue; if (checkSafeToSkip) { Path higher = files.higher(edits); long maxSeqId = Long.MAX_VALUE; if (higher != null) { // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+ String fileName = higher.getName(); maxSeqId = Math.abs(Long.parseLong(fileName)); } if (maxSeqId = minSeqId) { String msg = Maximum possible sequenceid for this log is + maxSeqId + , skipped the whole file, path= + edits; LOG.debug(msg); continue; } else { checkSafeToSkip = false; } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Status: Open (was: Patch Available) Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5617_1.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Status: Patch Available (was: Open) Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Attachment: HBASE-5617_2.patch @Andy This patch deals only with what is intended in the title of the JIRA. Can i raise another JIRA for adding other hooks? Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5617_1.patch, HBASE-5617_2.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Attachment: HBASE-5617_1.patch Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5617_1.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Status: Patch Available (was: Open) Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5617_1.patch With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE
[ https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5097: -- Fix Version/s: 0.94.1 0.96.0 0.92.2 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE --- Key: HBASE-5097 URL: https://issues.apache.org/jira/browse/HBASE-5097 Project: HBase Issue Type: Bug Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.2, 0.96.0, 0.94.1 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch In HRegionServer.java openScanner() {code} r.prepareScanner(scan); RegionScanner s = null; if (r.getCoprocessorHost() != null) { s = r.getCoprocessorHost().preScannerOpen(scan); } if (s == null) { s = r.getScanner(scan); } if (r.getCoprocessorHost() != null) { s = r.getCoprocessorHost().postScannerOpen(scan, s); } {code} If we dont have implemention for postScannerOpen the RegionScanner is null and so throwing nullpointer {code} java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) {code} Making this defect as blocker.. Pls feel free to change the priority if am wrong. Also correct me if my way of trying out coprocessors without implementing postScannerOpen is wrong. Am just a learner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5510) Pass region info in LoadBalancer.randomAssignment(ListServerName servers)
[ https://issues.apache.org/jira/browse/HBASE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5510: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pass region info in LoadBalancer.randomAssignment(ListServerName servers) --- Key: HBASE-5510 URL: https://issues.apache.org/jira/browse/HBASE-5510 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0 Attachments: HBase-5010_3.patch, HBase-5510.patch, HBase-5510_2.patch In LB there is randomAssignment(ListServerName servers) API which will be used by AM to assign a region from a down RS. [This will be also used in other cases like call to assign() API from client] I feel it would be better to pass the HRegionInfo also into this method. When the LB making a choice for a region assignment, when one RS is down, it would be nice that the LB knows for which region it is doing this server selection. +Scenario+ While one RS down, we wanted the regions to get moved to other RSs but a set of regions stay together. We are having custom load balancer but with the current way of LB interface this is not possible. Another way is I can allow a random assignment of the regions at the RS down time. Later with a cluster balance I can balance the regions as I need. But this might make regions assign 1st to one RS and then again move to another. Also for some time period my business use case can not get satisfied. Also I have seen some issue in JIRA which speaks about making sure that Root and META regions always sit in some specific RSs. With the current LB API this wont be possible in future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Attachment: HBASE-5584-2.patch @Andrew Instead of sleep i have added CountdownLatch for the assertion of create table. Pls review. Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584-1.patch, HBASE-5584-2.patch, HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5617) Provide coprocessor hooks in put flow while rollbackMemstore.
[ https://issues.apache.org/jira/browse/HBASE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5617: -- Component/s: coprocessors Provide coprocessor hooks in put flow while rollbackMemstore. - Key: HBASE-5617 URL: https://issues.apache.org/jira/browse/HBASE-5617 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 With coprocessors hooks while put happens we have the provision to create new puts to other tables or regions. These puts can be done with writeToWal as false. In 0.94 and above the puts are first written to memstore and then to WAL. If any failure in the WAL append or sync the memstore is rollbacked. Now the problem is that if the put that happens in the main flow fails there is no way to rollback the puts that happened in the prePut. We can add coprocessor hooks to like pre/postRoolBackMemStore. Is any one hook enough here? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5323: -- Fix Version/s: (was: 0.94.0) 0.94.1 Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master Key: HBASE-5323 URL: https://issues.apache.org/jira/browse/HBASE-5323 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.94.1 Attachments: HBASE-5323.patch, HBASE-5323.patch We know that while parsing the HLog we expect the proper length from HDFS. In WALReaderFSDataInputStream {code} assert(realLength = this.length); {code} We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be. I think in this condition we abort the master by catching such exceptions. Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Attachment: HBASE-5584.patch Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Status: Patch Available (was: Open) Patch for review. Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5520: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, HBASE-5520_3.patch, HBASE-5520_4.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Attachment: HBASE-5584-1.patch Updated patch. In my local it was passing. Now added some sleep after create so that we can ensure that the postCreateHandler is called. Pls review. Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584-1.patch, HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Status: Open (was: Patch Available) Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584-1.patch, HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5584) Coprocessor hooks can be called in the respective handlers
[ https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5584: -- Status: Patch Available (was: Open) Coprocessor hooks can be called in the respective handlers -- Key: HBASE-5584 URL: https://issues.apache.org/jira/browse/HBASE-5584 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5584-1.patch, HBASE-5584.patch Following points can be changed w.r.t to coprocessors - Call preCreate, postCreate, preEnable, postEnable, etc. in their respective handlers - Currently it is called in the HMaster thus making the postApis async w.r.t the handlers - Similar is the case with the balancer. with current behaviour once we are in the postEnable(for eg) we any way need to wait for the main enable handler to be completed. We should ensure that we dont wait in the main thread so again we need to spawn a thread and wait on that. On the other hand if the pre and post api is called on the handlers then only that handler thread will be used in the pre/post apis If the above said plan is ok i can prepare a patch for all such related changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5206: -- Resolution: Fixed Status: Resolved (was: Patch Available) Resolving as committed. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5153: -- Resolution: Fixed Fix Version/s: (was: 0.90.7) 0.90.6 Status: Resolved (was: Patch Available) This got committed to 0.90.6. Sorry for not resolving it at that time. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5568: -- Fix Version/s: 0.90.7 Updated 0.90.7 also in fix versions. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5520: -- Attachment: HBASE-5520_3.patch Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, HBASE-5520_3.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5520: -- Attachment: HBASE-5520_2.patch Updated the patch with requestSeek. Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5520: -- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5516) GZip leading to memory leak in 0.90. Fix similar to HBASE-5387 needed for 0.90.
[ https://issues.apache.org/jira/browse/HBASE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5516: -- Attachment: HBASE-5516_3_0.90.patch Updated patch addressing comments. GZip leading to memory leak in 0.90. Fix similar to HBASE-5387 needed for 0.90. Key: HBASE-5516 URL: https://issues.apache.org/jira/browse/HBASE-5516 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: HBASE-5516_2_0.90.patch, HBASE-5516_3_0.90.patch Usage of GZip is leading to resident memory leak in 0.90. We need to have something similar to HBASE-5387 in 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5545: -- Fix Version/s: 0.92.2 region can't be opened for a long time. Because the creating File failed. - Key: HBASE-5545 URL: https://issues.apache.org/jira/browse/HBASE-5545 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.6 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.7, 0.92.2 Scenario: 1. File is created 2. But while writing data, all datanodes might have crashed. So writing data will fail. 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. Suggestion to handle this scenario. --- 1. Check for the existence of the file, if exists delete the file and create new file. Here delete call for the file will not check whether the file is open or closed. Overwrite Option: 1. Overwrite option will be applicable if you are trying to overwrite a closed file. 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. This is the expected behaviour to avoid the Multiple clients writing to same file. Region server logs: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) at org.apache.hadoop.ipc.Client.call(Client.java:961) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) at $Proxy6.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at $Proxy6.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void
[jira] [Updated] (HBASE-5520) Support seek() reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5520: -- Attachment: HBASE-5520_1.patch Patch for trunk. Support seek() reseek() at RegionScanner Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Attachments: HBASE-5520_1.patch seek() reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
[ https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5531: -- Resolution: Fixed Fix Version/s: 0.94.0 Status: Resolved (was: Patch Available) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot - Key: HBASE-5531 URL: https://issues.apache.org/jira/browse/HBASE-5531 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.2 Reporter: Laxman Labels: build Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch Current profile is still pointing to 0.23.1-SNAPSHOT. This is failing to build as 23.1 is already released and snapshot is not available anymore. We can update this to 0.23.2-SNAPSHOT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.
[ https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5482: -- Attachment: HBASE-5482_2.patch Patch addressing Ted's comments. @Ted If the patch is fine, we can commit it for 0.90.7? In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS. - Key: HBASE-5482 URL: https://issues.apache.org/jira/browse/HBASE-5482 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: 5482-v2.txt, HBASE-5482_1.patch, HBASE-5482_2.patch There are possibility of 2 problems - When we populate regionsToMove while iterating the serverinfo in descending manner there is a chance that the same region can be added twice. Because in the first loop we do a randomization of the regions. Where as when we get we have neededRegions!= 0 we just get the region in the index and add it again . This may lead to have same region in the regionsToMove list. - Another problem is when the problem in the first point happens then there is a chance that the regionToMove can have the same src and destination and the same region can be picked every 5 mins. {code} for(Map.EntryHServerInfo, ListHRegionInfo server : serversByLoad.descendingMap().entrySet()) { BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); if (idx = server.getValue().size()) break; HRegionInfo region = server.getValue().get(idx); if (region.isMetaRegion()) continue; // Don't move meta regions. regionsToMove.add(new RegionPlan(region, server.getKey(), null)); if(--neededRegions == 0) { // No more regions needed, done shedding break; } } {code} If i have meta and root in the top two loaded region server(totally 3 RS), we just skip the regions in those region server and populate the region from the least loaded RS. Then in the next loop we iterate from the least loaded server and populate the destination as also the same server. This is leading to a condition where every 5 min balancing happens and also the server is same for src and dest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler
[ https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5490: -- Attachment: HBASE-5490.patch Patch for 0.90 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler Key: HBASE-5490 URL: https://issues.apache.org/jira/browse/HBASE-5490 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5490.patch The new state that was added RS_ZK_REGION_FAILED_OPEN was failing the rolling restart. So move the new enum to the end of the list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler
[ https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5490: -- Fix Version/s: 0.90.6 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler Key: HBASE-5490 URL: https://issues.apache.org/jira/browse/HBASE-5490 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5490.patch The new state that was added RS_ZK_REGION_FAILED_OPEN was failing the rolling restart. So move the new enum to the end of the list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.
[ https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5482: -- Attachment: HBASE-5482_1.patch Patch for 0.90. In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS. - Key: HBASE-5482 URL: https://issues.apache.org/jira/browse/HBASE-5482 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: HBASE-5482_1.patch There are possibility of 2 problems - When we populate regionsToMove while iterating the serverinfo in descending manner there is a chance that the same region can be added twice. Because in the first loop we do a randomization of the regions. Where as when we get we have neededRegions!= 0 we just get the region in the index and add it again . This may lead to have same region in the regionsToMove list. - Another problem is when the problem in the first point happens then there is a chance that the regionToMove can have the same src and destination and the same region can be picked every 5 mins. {code} for(Map.EntryHServerInfo, ListHRegionInfo server : serversByLoad.descendingMap().entrySet()) { BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); if (idx = server.getValue().size()) break; HRegionInfo region = server.getValue().get(idx); if (region.isMetaRegion()) continue; // Don't move meta regions. regionsToMove.add(new RegionPlan(region, server.getKey(), null)); if(--neededRegions == 0) { // No more regions needed, done shedding break; } } {code} If i have meta and root in the top two loaded region server(totally 3 RS), we just skip the regions in those region server and populate the region from the least loaded RS. Then in the next loop we iterate from the least loaded server and populate the destination as also the same server. This is leading to a condition where every 5 min balancing happens and also the server is same for src and dest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5482) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS.
[ https://issues.apache.org/jira/browse/HBASE-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5482: -- Summary: In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS. (was: Balancer in 0.90 algo leading to same region balanced twice and picking same region with Src and Destination as same RS.) In 0.90, balancer algo leading to same region balanced twice and picking same region with Src and Destination as same RS. - Key: HBASE-5482 URL: https://issues.apache.org/jira/browse/HBASE-5482 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 There are possibility of 2 problems - When we populate regionsToMove while iterating the serverinfo in descending manner there is a chance that the same region can be added twice. Because in the first loop we do a randomization of the regions. Where as when we get we have neededRegions!= 0 we just get the region in the index and add it again . This may lead to have same region in the regionsToMove list. - Another problem is when the problem in the first point happens then there is a chance that the regionToMove can have the same src and destination and the same region can be picked every 5 mins. {code} for(Map.EntryHServerInfo, ListHRegionInfo server : serversByLoad.descendingMap().entrySet()) { BalanceInfo balanceInfo = serverBalanceInfo.get(server.getKey()); int idx = balanceInfo == null ? 0 : balanceInfo.getNextRegionForUnload(); if (idx = server.getValue().size()) break; HRegionInfo region = server.getValue().get(idx); if (region.isMetaRegion()) continue; // Don't move meta regions. regionsToMove.add(new RegionPlan(region, server.getKey(), null)); if(--neededRegions == 0) { // No more regions needed, done shedding break; } } {code} If i have meta and root in the top two loaded region server(totally 3 RS), we just skip the regions in those region server and populate the region from the least loaded RS. Then in the next loop we iterate from the least loaded server and populate the destination as also the same server. This is leading to a condition where every 5 min balancing happens and also the server is same for src and dest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Attachment: HBASE-5200_trunk_latest_with_test_2.patch AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Status: Open (was: Patch Available) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Status: Patch Available (was: Open) 0.90 will submit tomorrow. AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the region assignment inconsistent - Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, hbase-5200_90_latest.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5323: -- Attachment: HBASE-5323.patch Patch for 0.90 Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master Key: HBASE-5323 URL: https://issues.apache.org/jira/browse/HBASE-5323 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7 Attachments: HBASE-5323.patch, HBASE-5323.patch We know that while parsing the HLog we expect the proper length from HDFS. In WALReaderFSDataInputStream {code} assert(realLength = this.length); {code} We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be. I think in this condition we abort the master by catching such exceptions. Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5323: -- Attachment: HBASE-5323.patch Patch for 0.90. Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master Key: HBASE-5323 URL: https://issues.apache.org/jira/browse/HBASE-5323 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7 Attachments: HBASE-5323.patch We know that while parsing the HLog we expect the proper length from HDFS. In WALReaderFSDataInputStream {code} assert(realLength = this.length); {code} We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be. I think in this condition we abort the master by catching such exceptions. Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.
[ https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5321: -- Affects Version/s: 0.90.5 Fix Version/s: 0.90.6 this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90. Key: HBASE-5321 URL: https://issues.apache.org/jira/browse/HBASE-5321 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 In HBASE-5160 we do not wait for TM to assign the regions after the first RS comes online. After doing this the variable this.allRegionServersOffline needs to be reset which is not done in 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5321) this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90.
[ https://issues.apache.org/jira/browse/HBASE-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5321: -- Attachment: HBASE-5321.patch Please review the patch. this.allRegionServersOffline not set to false after one RS comes online and assignment is done in 0.90. Key: HBASE-5321 URL: https://issues.apache.org/jira/browse/HBASE-5321 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6 Attachments: HBASE-5321.patch In HBASE-5160 we do not wait for TM to assign the regions after the first RS comes online. After doing this the variable this.allRegionServersOffline needs to be reset which is not done in 0.90. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5323: -- Summary: Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master (was: Need to handle assertion error if split log through ServerShutDownHandler by shutting down the master) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master Key: HBASE-5323 URL: https://issues.apache.org/jira/browse/HBASE-5323 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Fix For: 0.94.0, 0.90.7 We know that while parsing the HLog we expect the proper length from HDFS. In WALReaderFSDataInputStream {code} assert(realLength = this.length); {code} We are trying to come out if the above condition is not satisfied. But if SSH.splitLog() gets this problem then it lands in the run method of EventHandler. This kills the SSH thread and so further assignment does not happen. If ROOT and META are to be assigned they cannot be. I think in this condition we abort the master by catching such exceptions. Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5299) CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite wait in region assignment flow.
[ https://issues.apache.org/jira/browse/HBASE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5299: -- Summary: CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite wait in region assignment flow. (was: CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite loop in region assignment flow.) CatalogTracker.getMetaServerConnection() checks for root server connection and makes waitForMeta to go into infinite wait in region assignment flow. Key: HBASE-5299 URL: https://issues.apache.org/jira/browse/HBASE-5299 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor RSA, RS B and RS C are 3 region servers. RS A - META RS B - ROOT RS C - NON META and NON ROOT Kill RS B and wait for server shutdown handler to start. Start RS B again before assigning ROOT to RS C. Now the cluster will try to assign new regions to RS B. But as ROOT is not yet assigned the OpenRegionHandler.updateMeta will fail to update the regions just because ROOT is not online. {code} a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:25,126 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:25,159 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:35,385 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:23:35,449 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:16,666 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Attempting to transition node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:16,701 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x1352e27539c0009 Successfully transitioned node a87109263ed53e67158377a149c5a7be from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2012-01-30 16:24:20,788 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Interrupting thread Thread[PostOpenDeployTasks:a87109263ed53e67158377a149c5a7be,5,main] 2012-01-30 16:24:30,699 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Exception running postOpenDeployTasks; region=a87109263ed53e67158377a149c5a7be org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Interrupted at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:439) at org.apache.hadoop.hbase.catalog.MetaEditor.updateRegionLocation(MetaEditor.java:142) at org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:1382) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:221) {code} So we need to wait for TM to assign the regions again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Status: Patch Available (was: Open) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent. --- Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Status: Open (was: Patch Available) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent. --- Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Attachment: HBASE-5200_1.patch Updated patch. Test case passes with this. AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent. --- Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5200.patch, HBASE-5200_1.patch, TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Status: Patch Available (was: Open) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent. --- Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5200.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5200) AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent.
[ https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5200: -- Attachment: HBASE-5200.patch Patch for trunk. AM.ProcessRegionInTransition() and AM.handleRegion() races thus leaving the region assignment inconsistent. --- Key: HBASE-5200 URL: https://issues.apache.org/jira/browse/HBASE-5200 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.1 Attachments: HBASE-5200.patch This is the scenario Consider a case where the balancer is going on thus trying to close regions in a RS. Before we could close a master switch happens. On Master switch the set of nodes that are in RIT is collected and we first get Data and start watching the node After that the node data is added into RIT. Now by this time (before adding to RIT) if the RS to which close was called does a transition in AM.handleRegion() we miss the handling saying RIT state was null. {code} 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region a66d281d231dfcaea97c270698b26b6f from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region c12e53bfd48ddc5eec507d66821c4d23 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,358 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 59ae13de8c1eb325a0dd51f4902d2052 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region f45bc9614d7575f35244849af85aa078 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region cc3ecd7054fe6cd4a1159ed92fd62641 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 3af40478a17fee96b4a192b22c90d5a2 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region e6096a8466e730463e10d3d61f809b92 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 4806781a1a23066f7baed22b4d237e24 from server HOST-192-168-47-204,20020,1326342744518 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states 2012-01-13 10:50:46,359 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region d69e104131accaefe21dcc01fddc7629 from server HOST-192-168-47-205,20020,1326363111288 but region was in the state null and not in expected PENDING_CLOSE or CLOSING states {code} In branch the CLOSING node is created by RS thus leading to more inconsistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5153: -- Attachment: HBASE-5153_addendum_0.90_1.patch @Lars Addressing your comments. Based on your feedback will commit the patch if it is ok. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4654) [replication] Add a check to make sure we don't replicate to ourselves
[ https://issues.apache.org/jira/browse/HBASE-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4654: -- Fix Version/s: (was: 0.90.6) 0.92.1 0.90.7 Moving to 0.90.7 and 0.92.1.. Please pull back if you think differently. [replication] Add a check to make sure we don't replicate to ourselves -- Key: HBASE-4654 URL: https://issues.apache.org/jira/browse/HBASE-4654 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.90.7, 0.92.1 Attachments: 4654-trunk.txt It's currently possible to add a peer for replication and point it to the local cluster, which I believe could very well happen for those like us that use only one ZK ensemble per DC so that only the root znode changes when you want to set up replication intra-DC. I don't think comparing just the cluster ID would be enough because you would normally use a different one for another cluster and nothing will block you from pointing elsewhere. Comparing the ZK ensemble address doesn't work either when you have multiple DNS entries that point at the same place. I think this could be resolved by looking up the master address in the relevant znode as it should be exactly the same thing in the case where you have the same cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4762) ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation
[ https://issues.apache.org/jira/browse/HBASE-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4762: -- Fix Version/s: (was: 0.90.6) 0.90.7 Hadoop Flags: Reviewed Moving to 0.90.7 ROOT and META region never be assigned if IOE throws in verifyRootRegionLocation Key: HBASE-4762 URL: https://issues.apache.org/jira/browse/HBASE-4762 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: mingjian Assignee: mingjian Fix For: 0.90.7 Patch in HBASE-3914 fixed root assigned in two regionservers. But it seemed like root region will never be assigned if verifyRootRegionLocation throws IOE. Like following master logs: {noformat} 2011-10-19 19:13:34,873 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_S HUTDOWN org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1090) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:256) at $Proxy7.getRegionInfo(Unknown Source) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} After this, -ROOT-'s region won't be assigned, like this: {noformat} 2011-10-19 19:18:40,000 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: locateRegionInMeta parent Table=-ROOT-, metaLocation=address: dw79.kgb.sqa.cm4:60020, regioninfo: -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after s leep of 1000 because: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2771) at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1802) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:569) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1091) {noformat} So we should rewrite the verifyRootRegionLocation method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5004) Better manage standalone setups on Ubuntu, the 127.0.1.1 issue
[ https://issues.apache.org/jira/browse/HBASE-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5004: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving to 0.90.7 Better manage standalone setups on Ubuntu, the 127.0.1.1 issue -- Key: HBASE-5004 URL: https://issues.apache.org/jira/browse/HBASE-5004 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.94.0, 0.90.7 Numerous times users have come with issues setting up HBase on Ubuntu because it has the 127.0.1.1 line messing everything. Here's an example: {quote} 2011-12-10 00:18:24,312 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as localhost,33371,1323476299775, RPC listening on /127.0.1.1:33371, sessionid=0x1342555adc90002 ... 2011-12-10 00:18:27,135 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region -ROOT-,,0.70236052 to localhost,33371,1323476299775 2011-12-10 00:18:27,135 DEBUG org.apache.hadoop.hbase.master.ServerManager: New connection to localhost,33371,1323476299775 2011-12-10 00:18:27,155 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /127.0.0.1:33371 could not be reached after 1 tries, giving up. 2011-12-10 00:18:27,156 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to serverName=localhost,33371,1323476299775, load=(requests=0, regions=0, usedHeap=23, maxHeap=983), trying to assign elsewhere instead; retry=0 org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /127.0.0.1:33371 after attempts=1 {quote} We should have a special check in standalone mode to make sure we won't fall into that trap and then print a useful error message that would hopefully appear on the command line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4462) Properly treating SocketTimeoutException
[ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4462: -- Fix Version/s: (was: 0.90.6) 0.90.7 Properly treating SocketTimeoutException Key: HBASE-4462 URL: https://issues.apache.org/jira/browse/HBASE-4462 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: HBASE-4462_0.90.x.patch SocketTimeoutException is currently treated like any IOE inside of HCM.getRegionServerWithRetries and I think this is a problem. This method should only do retries in cases where we are pretty sure the operation will complete, but with STE we already waited for (by default) 60 seconds and nothing happened. I found this while debugging Douglas Campbell's problem on the mailing list where it seemed like he was using the same scanner from multiple threads, but actually it was just the same client doing retries while the first run didn't even finish yet (that's another problem). You could see the first scanner, then up to two other handlers waiting for it to finish in order to run (because of the synchronization on RegionScanner). So what should we do? We could treat STE as a DoNotRetryException and let the client deal with it, or we could retry only once. There's also the option of having a different behavior for get/put/icv/scan, the issue with operations that modify a cell is that you don't know if the operation completed or not (same when a RS dies hard after completing let's say a Put but just before returning to the client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4064) Two concurrent unassigning of the same region caused the endless loop of Region has been PENDING_CLOSE for too long...
[ https://issues.apache.org/jira/browse/HBASE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4064: -- Fix Version/s: (was: 0.90.6) 0.90.7 Two concurrent unassigning of the same region caused the endless loop of Region has been PENDING_CLOSE for too long... Key: HBASE-4064 URL: https://issues.apache.org/jira/browse/HBASE-4064 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: Jieshan Bean Fix For: 0.90.7 Attachments: HBASE-4064-v1.patch, HBASE-4064_branch90V2.patch, disableflow.png 1. If there is a rubbish RegionState object with PENDING_CLOSE in regionsInTransition(The RegionState was remained by some exception which should be removed, that's why I called it as rubbish object), but the region is not currently assigned anywhere, TimeoutMonitor will fall into an endless loop: 2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301 2011-06-27 10:32:21,326 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 2011-06-27 10:32:21,438 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining) 2011-06-27 10:32:21,441 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere 2011-06-27 10:32:31,207 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301 2011-06-27 10:32:31,207 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 2011-06-27 10:32:31,215 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining) 2011-06-27 10:32:31,215 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere 2011-06-27 10:32:41,164 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. state=PENDING_CLOSE, ts=1309141555301 2011-06-27 10:32:41,164 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. 2011-06-27 10:32:41,172 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. (offlining) 2011-06-27 10:32:41,172 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to unassign region test2,070712,1308971310309.9a6e26d40293663a79523c58315b930f. but it is not currently assigned anywhere . 2 In the following scenario, two concurrent unassigning call of the same region may lead to the above problem: the first unassign call send rpc call success, the master watched the event of RS_ZK_REGION_CLOSED, process this event, will create a ClosedRegionHandler to remove the state of the region in master.eg. while ClosedRegionHandler is running in hbase.master.executor.closeregion.threads thread (A), another unassign call of same region run in another thread(B). while thread B run if (!regions.containsKey(region)), this.regions have the region info, now cpu switch to thread A. The thread A will remove the region from the sets of this.regions and regionsInTransition, then switch to thread B. the thread B run continue, will throw an exception with the msg of Server null returned java.lang.NullPointerException: Passed server is null for 9a6e26d40293663a79523c58315b930f, but without removing the new-adding RegionState from regionsInTransition,and it can not be removed for ever. public void unassign(HRegionInfo region, boolean force) { LOG.debug(Starting unassignment of region + region.getRegionNameAsString() + (offlining));
[jira] [Updated] (HBASE-4094) improve hbck tool to fix more hbase problem
[ https://issues.apache.org/jira/browse/HBASE-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4094: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving to 0.90.7. HBASE-5128 also is related to improving hbck tool. improve hbck tool to fix more hbase problem --- Key: HBASE-4094 URL: https://issues.apache.org/jira/browse/HBASE-4094 Project: HBase Issue Type: New Feature Components: master Affects Versions: 0.90.3 Reporter: feng xu Fix For: 0.90.7 Attachments: HbaseFsck_TableChain.patch Original Estimate: 12h Remaining Estimate: 12h -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4083) If Enable table is not completed and is partial, then scanning of the table is not working
[ https://issues.apache.org/jira/browse/HBASE-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4083: -- Fix Version/s: (was: 0.90.6) 0.92.0 0.90.7 Not fixed in 0.90. Hence not resolving the issue. But committed in trunk and 0.92 If Enable table is not completed and is partial, then scanning of the table is not working --- Key: HBASE-4083 URL: https://issues.apache.org/jira/browse/HBASE-4083 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.7, 0.92.0 Attachments: HBASE-4083-1.patch, HBASE-4083_0.90.patch, HBASE-4083_0.90_1.patch, HBASE-4083_trunk.patch, HBASE-4083_trunk_1.patch Consider the following scenario Start the Master, Backup master and RegionServer. Create a table which in turn creates a region. Disable the table. Enable the table again. Kill the Active master exactly at the point before the actual region assignment is started. Restart or switch master. Scan the table. NotServingRegionExcepiton is thrown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5197) [replication] Handle socket timeouts in ReplicationSource to prevent DDOS
[ https://issues.apache.org/jira/browse/HBASE-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5197: -- Fix Version/s: (was: 0.90.6) 0.90.7 Updating affect versions to 0.90.7 [replication] Handle socket timeouts in ReplicationSource to prevent DDOS - Key: HBASE-5197 URL: https://issues.apache.org/jira/browse/HBASE-5197 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.90.7, 0.92.1 Kind of like HBASE-4462 but for replication. If while replicating you get a socket timeout, the last thing you want to do is to retry it right away. Since we can't fail the replication thread, the best I can think of is to sleep a really long amount of time. Planning to bring this to all branches. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3917) Separate the Avro schema definition file from the code
[ https://issues.apache.org/jira/browse/HBASE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-3917: -- Fix Version/s: (was: 0.90.6) 0.90.7 Separate the Avro schema definition file from the code -- Key: HBASE-3917 URL: https://issues.apache.org/jira/browse/HBASE-3917 Project: HBase Issue Type: Improvement Components: avro Affects Versions: 0.90.3 Reporter: Lars George Assignee: Alex Newman Priority: Trivial Labels: noob Fix For: 0.90.7 Attachments: 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch The Avro schema files are in the src/main/java path, but should be in /src/main/resources just like the Hbase.thrift is. Makes the separation the same and cleaner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5157) Backport HBASE-4880- Region is on service before openRegionHandler completes, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5157: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving to 0.90.7. Needs some more code rewrite to make this fit in 0.90. Backport HBASE-4880- Region is on service before openRegionHandler completes, may cause data loss - Key: HBASE-5157 URL: https://issues.apache.org/jira/browse/HBASE-5157 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: HBASE-4880_branch90_1.patch Backporting to 0.90.6 considering the importance of the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5179: -- Fix Version/s: (was: 0.90.6) 0.90.7 Updating the affected version to 0.90.7. This issue will go into 0.90.7 after sufficient testing. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5271) Result.getValue and Result.getColumnLatest return the wrong column.
[ https://issues.apache.org/jira/browse/HBASE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5271: -- Fix Version/s: (was: 0.90.6) 0.90.7 Updated affects version to 0.90.7 Result.getValue and Result.getColumnLatest return the wrong column. --- Key: HBASE-5271 URL: https://issues.apache.org/jira/browse/HBASE-5271 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.5 Reporter: Ghais Issa Assignee: Ghais Issa Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5271-90.txt, 5271-v2.txt, fixKeyValueMatchingColumn.diff, testGetValue.diff In the following example result.getValue returns the wrong column KeyValue kv = new KeyValue(Bytes.toBytes(r), Bytes.toBytes(24), Bytes.toBytes(2), Bytes.toBytes(7L)); Result result = new Result(new KeyValue[] { kv }); System.out.println(Bytes.toLong(result.getValue(Bytes.toBytes(2), Bytes.toBytes(2; //prints 7. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5243) LogSyncerThread not getting shutdown waiting for the interrupted flag
[ https://issues.apache.org/jira/browse/HBASE-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5243: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.92, trunk and 0.90 LogSyncerThread not getting shutdown waiting for the interrupted flag - Key: HBASE-5243 URL: https://issues.apache.org/jira/browse/HBASE-5243 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6, 0.92.1 Attachments: 5243-92.addendum, HBASE-5243_0.90.patch, HBASE-5243_0.90_1.patch, HBASE-5243_trunk.patch In the LogSyncer run() we keep looping till this.isInterrupted flag is set. But in some cases the DFSclient is consuming the Interrupted exception. So we are running into infinite loop in some shutdown cases. I would suggest that as we are the ones who tries to close down the LogSyncerThread we can introduce a variable like Close or shutdown and based on the state of this flag along with isInterrupted() we can make the thread stop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5237) Addendum for HBASE-5160 and HBASE-4397
[ https://issues.apache.org/jira/browse/HBASE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5237: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.90, trunk and 0.92 Addendum for HBASE-5160 and HBASE-4397 -- Key: HBASE-5237 URL: https://issues.apache.org/jira/browse/HBASE-5237 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.90.6, 0.92.1 Attachments: HBASE-5237_0.90.patch, HBASE-5237_trunk.patch As part of HBASE-4397 there is one more scenario where the patch has to be applied. {code} RegionPlan plan = getRegionPlan(state, forceNewPlan); if (plan == null) { debugLog(state.getRegion(), Unable to determine a plan to assign + state); return; // Should get reassigned later when RIT times out. } {code} I think in this scenario also {code} this.timeoutMonitor.setAllRegionServersOffline(true); {code} this should be done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4951: -- Fix Version/s: (was: 0.90.6) 0.90.7 Updated the affected version to 0.90.7. master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.7 Attachments: HBASE-4951.patch, HBASE-4951_branch.patch It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3855) Performance degradation of memstore because reseek is linear
[ https://issues.apache.org/jira/browse/HBASE-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-3855: -- Fix Version/s: (was: 0.90.6) 0.90.7 Not fixed in this 0.90.6. Hence moving it to 0.90.7. Performance degradation of memstore because reseek is linear Key: HBASE-3855 URL: https://issues.apache.org/jira/browse/HBASE-3855 Project: HBase Issue Type: Improvement Reporter: dhruba borthakur Priority: Blocker Fix For: 0.90.7 Attachments: memstoreReseek.txt, memstoreReseek2.txt The scanner use reseek to find the next row (or next column) as part of a scan. The reseek code iterates over a Set to position itself at the right place. If there are many thousands of kvs that need to be skipped over, then the time-cost is very high. In this case, a seek would be far lesser in cost than a reseek. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5003) If the master is started with a wrong root dir, it gets stuck and can't be killed
[ https://issues.apache.org/jira/browse/HBASE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5003: -- Fix Version/s: (was: 0.90.6) 0.90.7 Updated affect version to 0.90.7 If the master is started with a wrong root dir, it gets stuck and can't be killed - Key: HBASE-5003 URL: https://issues.apache.org/jira/browse/HBASE-5003 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Labels: noob Fix For: 0.94.0, 0.90.7, 0.92.1 Reported by a new user on IRC who tried to set hbase.rootdir to file:///~/hbase, the master gets stuck and cannot be killed. I tried something similar on my machine and it spins while logging: {quote} 2011-12-09 16:11:17,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:27,002 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase 2011-12-09 16:11:37,003 WARN org.apache.hadoop.hbase.util.FSUtils: Unable to create version file at file:/bin/hbase, retrying: Mkdirs failed to create file:/bin/hbase {quote} The reason it cannot be stopped is that the master's main thread is stuck in there and will never be notified: {quote} Master:0;su-jdcryans-01.local,51116,1323475535684 prio=5 tid=7f92b7a3c000 nid=0x1137ba000 waiting on condition [1137b9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:297) at org.apache.hadoop.hbase.util.FSUtils.setVersion(FSUtils.java:268) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:339) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:218) at java.lang.Thread.run(Thread.java:680) {quote} It seems we should do a better handling of the exceptions we get in there, and die if we need to. It would make a better user experience. Maybe also do a check on hbase.rootdir before even starting the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3834) Store ignores checksum errors when opening files
[ https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-3834: -- Moving to 0.90.7 Store ignores checksum errors when opening files Key: HBASE-3834 URL: https://issues.apache.org/jira/browse/HBASE-3834 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.2 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.6 If you corrupt one of the storefiles in a region (eg using vim to muck up some bytes), the region will still open, but that storefile will just be ignored with a log message. We should probably not do this in general - better to keep that region unassigned and force an admin to make a decision to remove the bad storefile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4470) ServerNotRunningException coming out of assignRootAndMeta kills the Master
[ https://issues.apache.org/jira/browse/HBASE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4470: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving into 0.90.7 ServerNotRunningException coming out of assignRootAndMeta kills the Master -- Key: HBASE-4470 URL: https://issues.apache.org/jira/browse/HBASE-4470 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.90.7 I'm surprised we still have issues like that and I didn't get a hit while googling so forgive me if there's already a jira about it. When the master starts it verifies the locations of root and meta before assigning them, if the server is started but not running you'll get this: {quote} 2011-09-23 04:47:44,859 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: RemoteException connecting to RS org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) at $Proxy6.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:969) at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:388) at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:287) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:484) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:441) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:388) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282) {quote} I hit that 3-4 times this week while debugging something else. The worst is that when you restart the master it sees that as a failover, but none of the regions are assigned so it takes an eternity to get back fully online. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK
[ https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4298: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving into 0.90.7 Support to drain RS nodes through ZK Key: HBASE-4298 URL: https://issues.apache.org/jira/browse/HBASE-4298 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Environment: all Reporter: Aravind Gottipati Priority: Critical Labels: patch Fix For: 0.90.7 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, trunk_with_test.txt HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks. HDFS goes one step further and even drains these nodes for you. This enhancement is a step in that direction. The idea is that we mark nodes in zookeeper as draining nodes. This means that they don't get any more new regions. These draining nodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. Eventually, support for draining them can be added. I am submitting two patches for review - one for the 0.90 branch and one for trunk (in git). Here are the two patches 0.90 - https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2 trunk - https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5 I have tested both these patches and they work as advertised. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4288) Server not running exception during meta verification causes RS abort
[ https://issues.apache.org/jira/browse/HBASE-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4288: -- Fix Version/s: (was: 0.90.6) 0.90.7 Moving into 0.90.7 Server not running exception during meta verification causes RS abort --- Key: HBASE-4288 URL: https://issues.apache.org/jira/browse/HBASE-4288 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.7 Attachments: 4288-v2.txt, 4288.txt The master tried to verify the META location just as that server was shutting down due to an abort. This caused the Server not running exception to get thrown, which wasn't handled properly in the master, causing it to abort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4298) Support to drain RS nodes through ZK
[ https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4298: -- Fix Version/s: 0.92.0 @Stack This issue has gone into 0.92 and trunk. As it is a new feature do you want to go into future 0.90 releases? if not can remove the fix versions as 0.90? Support to drain RS nodes through ZK Key: HBASE-4298 URL: https://issues.apache.org/jira/browse/HBASE-4298 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.90.4 Environment: all Reporter: Aravind Gottipati Priority: Critical Labels: patch Fix For: 0.90.7, 0.92.0 Attachments: 4298-trunk-v2.txt, 4298-trunk-v3.txt, 90_hbase.patch, drainingservertest-v2.txt, drainingservertest.txt, trunk_hbase.patch, trunk_with_test.txt HDFS currently has a way to exclude certain datanodes and prevent them from getting new blocks. HDFS goes one step further and even drains these nodes for you. This enhancement is a step in that direction. The idea is that we mark nodes in zookeeper as draining nodes. This means that they don't get any more new regions. These draining nodes look exactly the same as the corresponding nodes in /rs, except they live under /draining. Eventually, support for draining them can be added. I am submitting two patches for review - one for the 0.90 branch and one for trunk (in git). Here are the two patches 0.90 - https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2 trunk - https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5 I have tested both these patches and they work as advertised. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4288) Server not running exception during meta verification causes RS abort
[ https://issues.apache.org/jira/browse/HBASE-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4288: -- Fix Version/s: 0.92.0 Already committed to 0.92 and trunk But not into 0.90. Hence not resolving just updating fix version. Server not running exception during meta verification causes RS abort --- Key: HBASE-4288 URL: https://issues.apache.org/jira/browse/HBASE-4288 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.7, 0.92.0 Attachments: 4288-v2.txt, 4288.txt The master tried to verify the META location just as that server was shutting down due to an abort. This caused the Server not running exception to get thrown, which wasn't handled properly in the master, causing it to abort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang
[ https://issues.apache.org/jira/browse/HBASE-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4550: -- Fix Version/s: (was: 0.90.6) 0.90.7 @Wanbin If you provide a new patch can be integrated to 0.90.7 Moving to 0.90.7 When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang --- Key: HBASE-4550 URL: https://issues.apache.org/jira/browse/HBASE-4550 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: wanbin Assignee: wanbin Fix For: 0.90.7 Attachments: hbase-0.90.3.patch, patch, patch.txt Original Estimate: 2h Remaining Estimate: 2h when master passed regionserver different address, regionserver didn't create new zookeeper znode, master store new address in ServerManager, when call stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, serverManager.expireServer is not be called. so stop-hbase.sh is hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null
[ https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4848: -- Fix Version/s: (was: 0.90.6) 0.90.5 0.92.0 Already committed and resolved issue. Hence closing the issue as resolved. TestScanner failing because hostname can't be null -- Key: HBASE-4848 URL: https://issues.apache.org/jira/browse/HBASE-4848 Project: HBase Issue Type: Bug Affects Versions: 0.90.5 Reporter: stack Assignee: stack Fix For: 0.90.5, 0.92.0 Attachments: 4848-092.txt, 4848.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.
[ https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5235: -- Attachment: HBASE-5235_0.90_2.patch Updated patch addressing Ted's comments for 0.90. Already trunk patch incorporates Ted's comments. HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions. -- Key: HBASE-5235 URL: https://issues.apache.org/jira/browse/HBASE-5235 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.1, 0.90.6 Attachments: HBASE-5235_0.90.patch, HBASE-5235_0.90_1.patch, HBASE-5235_0.90_2.patch, HBASE-5235_trunk.patch Pls find the analysis. Correct me if am wrong {code} 2012-01-15 05:14:02,374 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got while writing log entry to log java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026) {code} Here we have an exception in one of the writer threads. If any exception we try to hold it in an Atomic variable {code} private void writerThreadError(Throwable t) { thrown.compareAndSet(null, t); } {code} In the finally block of splitLog we try to close the streams. {code} for (WriterThread t: writerThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } checkForErrors(); } LOG.info(Split writers finished); return closeStreams(); {code} Inside checkForErrors {code} private void checkForErrors() throws IOException { Throwable thrown = this.thrown.get(); if (thrown == null) return; if (thrown instanceof IOException) { throw (IOException)thrown; } else { throw new RuntimeException(thrown); } } So once we throw the exception the DFSStreamer threads are not getting closed. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.
[ https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5235: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.90 and trunk. Thanks for the review Ted. HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions. -- Key: HBASE-5235 URL: https://issues.apache.org/jira/browse/HBASE-5235 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.1, 0.90.6 Attachments: HBASE-5235_0.90.patch, HBASE-5235_0.90_1.patch, HBASE-5235_0.90_2.patch, HBASE-5235_trunk.patch Pls find the analysis. Correct me if am wrong {code} 2012-01-15 05:14:02,374 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got while writing log entry to log java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026) {code} Here we have an exception in one of the writer threads. If any exception we try to hold it in an Atomic variable {code} private void writerThreadError(Throwable t) { thrown.compareAndSet(null, t); } {code} In the finally block of splitLog we try to close the streams. {code} for (WriterThread t: writerThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } checkForErrors(); } LOG.info(Split writers finished); return closeStreams(); {code} Inside checkForErrors {code} private void checkForErrors() throws IOException { Throwable thrown = this.thrown.get(); if (thrown == null) return; if (thrown instanceof IOException) { throw (IOException)thrown; } else { throw new RuntimeException(thrown); } } So once we throw the exception the DFSStreamer threads are not getting closed. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5235) HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions.
[ https://issues.apache.org/jira/browse/HBASE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-5235: -- Attachment: HBASE-5235_0.90.patch Patch for 0.90. If this patch is fine i will prepare a similar patch for 0.92 HLogSplitter writer thread's streams not getting closed when any of the writer threads has exceptions. -- Key: HBASE-5235 URL: https://issues.apache.org/jira/browse/HBASE-5235 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.90.5 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.1, 0.90.6 Attachments: HBASE-5235_0.90.patch Pls find the analysis. Correct me if am wrong {code} 2012-01-15 05:14:02,374 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-9 Got while writing log entry to log java.io.IOException: All datanodes 10.18.40.200:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3373) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2811) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3026) {code} Here we have an exception in one of the writer threads. If any exception we try to hold it in an Atomic variable {code} private void writerThreadError(Throwable t) { thrown.compareAndSet(null, t); } {code} In the finally block of splitLog we try to close the streams. {code} for (WriterThread t: writerThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } checkForErrors(); } LOG.info(Split writers finished); return closeStreams(); {code} Inside checkForErrors {code} private void checkForErrors() throws IOException { Throwable thrown = this.thrown.get(); if (thrown == null) return; if (thrown instanceof IOException) { throw (IOException)thrown; } else { throw new RuntimeException(thrown); } } So once we throw the exception the DFSStreamer threads are not getting closed. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira