[jira] [Assigned] (HBASE-5809) Avoid move api to take the destination server same as the source server.
[ https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5809: - Assignee: rajeshbabu > Avoid move api to take the destination server same as the source server. > > > Key: HBASE-5809 > URL: https://issues.apache.org/jira/browse/HBASE-5809 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.1 >Reporter: ramkrishna.s.vasudevan >Assignee: rajeshbabu >Priority: Minor > Labels: patch > Fix For: 0.96.0 > > Attachments: HBASE-5809.patch > > > In Move currently we take any destination specified and if the destination is > same as the source we still do unassign and assign. Here we can have > problems due to RegionAlreadyInTransitionException and thus hanging the > region in RIT for long time. We can avoid this scenario by not allowing the > move to happen in this scenario. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status
[ https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5840: - Assignee: ramkrishna.s.vasudevan > Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing > the old status > -- > > Key: HBASE-5840 > URL: https://issues.apache.org/jira/browse/HBASE-5840 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.0 >Reporter: Gopinathan A >Assignee: ramkrishna.s.vasudevan > Fix For: 0.96.0, 0.94.1 > > > TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will > keeps showing old status. > This will miss leads the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region
[ https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5816: - Assignee: ramkrishna.s.vasudevan > Balancer and ServerShutdownHandler concurrently reassigning the same region > --- > > Key: HBASE-5816 > URL: https://issues.apache.org/jira/browse/HBASE-5816 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.6 >Reporter: Maryann Xue >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Attachments: HBASE-5816.patch > > > The first assign thread exits with success after updating the RegionState to > PENDING_OPEN, while the second assign follows immediately into "assign" and > fails the RegionState check in setOfflineInZooKeeper(). This causes the > master to abort. > In the below case, the two concurrent assigns occurred when AM tried to > assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler > tried to assign this region (from the region plan) spontaneously. > 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance > hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., > src=hadoop05.sh.intel.com,60020,1334544902186, > dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 > 2012-04-17 05:44:57,648 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. > (offlining) > 2012-04-17 05:44:57,648 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, > regions=0, usedHeap=0, maxHeap=0) for region > TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. > 2012-04-17 05:44:57,666 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned > node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b > (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., > server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING) > 2012-04-17 05:52:58,984 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. > state=CLOSED, ts=1334612697672, > server=hadoop05.sh.intel.com,60020,1334544902186 > 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:6-0x236b912e9b3000e Creating (or updating) unassigned node for > fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state > 2012-04-17 05:52:59,096 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; > plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., > src=hadoop05.sh.intel.com,60020,1334544902186, > dest=xmlqa-clv16.sh.intel.com,60020,1334612497253 > 2012-04-17 05:52:59,096 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to > xmlqa-clv16.sh.intel.com,60020,1334612497253 > 2012-04-17 05:54:19,159 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. > state=PENDING_OPEN, ts=1334613179096, > server=xmlqa-clv16.sh.intel.com,60020,1334612497253 > 2012-04-17 05:54:59,033 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of > TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to > serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, > regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0 > java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket > timeout exception: java.net.SocketTimeoutException: 12 millis timeout > while waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 > remote=/10.239.47.87:60020] > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283) > at $Proxy7.openRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912) > at
[jira] [Assigned] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.
[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5545: - Assignee: ramkrishna.s.vasudevan (was: gaojinchao) > region can't be opened for a long time. Because the creating File failed. > - > > Key: HBASE-5545 > URL: https://issues.apache.org/jira/browse/HBASE-5545 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.6 >Reporter: gaojinchao >Assignee: ramkrishna.s.vasudevan > Fix For: 0.90.7, 0.92.2, 0.94.0 > > Attachments: HBASE-5545.patch, HBASE-5545.patch > > > Scenario: > > 1. File is created > 2. But while writing data, all datanodes might have crashed. So writing data > will fail. > 3. Now even if close is called in finally block, close also will fail and > throw the Exception because writing data failed. > 4. After this if RS try to create the same file again, then > AlreadyBeingCreatedException will come. > Suggestion to handle this scenario. > --- > 1. Check for the existence of the file, if exists delete the file and create > new file. > Here delete call for the file will not check whether the file is open or > closed. > Overwrite Option: > > 1. Overwrite option will be applicable if you are trying to overwrite a > closed file. > 2. If the file is not closed, then even with overwrite option Same > AlreadyBeingCreatedException will be thrown. > This is the expected behaviour to avoid the Multiple clients writing to same > file. > Region server logs: > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to > create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo > for > DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 > on client 158.1.132.19 because current leaseholder is trying to recreate file. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) > at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) > at org.apache.hadoop.ipc.Client.call(Client.java:961) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) > at $Proxy6.create(Unknown Source) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at $Proxy6.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3643) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > [2012-03-07 20:51:45,858] [WARN ] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] > [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvo
[jira] [Assigned] (HBASE-5782) Not all the regions are getting assigned after the log splitting.
[ https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5782: - Assignee: ramkrishna.s.vasudevan > Not all the regions are getting assigned after the log splitting. > - > > Key: HBASE-5782 > URL: https://issues.apache.org/jira/browse/HBASE-5782 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 0.94.0 >Reporter: Gopinathan A >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 0.94.0 > > > Create a table with 1000 splits, after the region assignemnt, kill the > regionserver wich contains META table. > Here few regions are missing after the log splitting and region assigment. > HBCK report shows multiple region holes are got created. > Same scenario was verified mulitple times in 0.92.1, no issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5651) [findbugs] Address wait/notify synchronization/inconsistency in sync
[ https://issues.apache.org/jira/browse/HBASE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5651: - Assignee: ramkrishna.s.vasudevan > [findbugs] Address wait/notify synchronization/inconsistency in sync > > > Key: HBASE-5651 > URL: https://issues.apache.org/jira/browse/HBASE-5651 > Project: HBase > Issue Type: Sub-task > Components: scripts >Reporter: Jonathan Hsieh >Assignee: ramkrishna.s.vasudevan > > See > https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html#Warnings_MT_CORRECTNESS > fix classes IS,LI,MWM, NN, SWL, UG, UW -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem
[ https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5595: - Assignee: Uma Maheswara Rao G (was: Uma Mahesh) > Fix NoSuchMethodException in 0.92 when running on local filesystem > -- > > Key: HBASE-5595 > URL: https://issues.apache.org/jira/browse/HBASE-5595 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 0.92.2 > > Attachments: HBASE-5595.patch > > > Fix this ugly exception that shows when running 0.92.1 when on local > filesystem: > {code} > 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > getNumCurrentReplicas--HDFS-826 not available; > hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87 > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas() > at java.lang.Class.getDeclaredMethod(Class.java:1937) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425) > at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:408) > at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:331) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem
[ https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5595: - Assignee: Uma Mahesh Thanks for the patch Uma. Added you to the contributor list. > Fix NoSuchMethodException in 0.92 when running on local filesystem > -- > > Key: HBASE-5595 > URL: https://issues.apache.org/jira/browse/HBASE-5595 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Uma Mahesh >Priority: Critical > Fix For: 0.92.2 > > Attachments: HBASE-5595.patch > > > Fix this ugly exception that shows when running 0.92.1 when on local > filesystem: > {code} > 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > getNumCurrentReplicas--HDFS-826 not available; > hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87 > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas() > at java.lang.Class.getDeclaredMethod(Class.java:1937) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425) > at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:408) > at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:331) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648) > at java.lang.Thread.run(Thread.java:680) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
[ https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5531: - Assignee: ramkrishna.s.vasudevan > Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot > - > > Key: HBASE-5531 > URL: https://issues.apache.org/jira/browse/HBASE-5531 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 0.92.2 >Reporter: Laxman >Assignee: ramkrishna.s.vasudevan > Labels: build > Fix For: 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch > > > Current profile is still pointing to 0.23.1-SNAPSHOT. > This is failing to build as 23.1 is already released and snapshot is not > available anymore. > We can update this to 0.23.2-SNAPSHOT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
[ https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5531: - Assignee: Laxman (was: ramkrishna.s.vasudevan) > Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot > - > > Key: HBASE-5531 > URL: https://issues.apache.org/jira/browse/HBASE-5531 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 0.92.2 >Reporter: Laxman >Assignee: Laxman > Labels: build > Fix For: 0.92.2, 0.94.0, 0.96.0 > > Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch > > > Current profile is still pointing to 0.23.1-SNAPSHOT. > This is failing to build as 23.1 is already released and snapshot is not > available anymore. > We can update this to 0.23.2-SNAPSHOT. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5510) Change in LB.randomAssignment(List servers) API
[ https://issues.apache.org/jira/browse/HBASE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5510: - Assignee: ramkrishna.s.vasudevan > Change in LB.randomAssignment(List servers) API > --- > > Key: HBASE-5510 > URL: https://issues.apache.org/jira/browse/HBASE-5510 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.92.0 >Reporter: Anoop Sam John >Assignee: ramkrishna.s.vasudevan > > In LB there is randomAssignment(List) API which will be > used by AM to assign > a region from a down RS. [This will be also used in other cases like call to > assign() API from client] > I feel it would be better to pass the HRegionInfo also into this method. > When the LB making a choice for a region > assignment, when one RS is down, it would be nice that the LB knows for > which region it is doing this server selection. > +Scenario+ > While one RS down, we wanted the regions to get moved to other RSs but a set > of regions stay together. We are having custom load balancer but with the > current way of LB interface this is not possible. Another way is I can allow > a random assignment of the regions at the RS down time. Later with a cluster > balance I can balance the regions as I need. But this might make regions > assign 1st to one RS and then again move to another. Also for some time > period my business use case can not get satisfied. > Also I have seen some issue in JIRA which speaks about making sure that Root > and META regions always sit in some specific RSs. With the current LB API > this wont be possible in future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master
[ https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5323: - Assignee: ramkrishna.s.vasudevan > Need to handle assertion error while splitting log through > ServerShutDownHandler by shutting down the master > > > Key: HBASE-5323 > URL: https://issues.apache.org/jira/browse/HBASE-5323 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.5 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 0.94.0, 0.90.7 > > > We know that while parsing the HLog we expect the proper length from HDFS. > In WALReaderFSDataInputStream > {code} > assert(realLength >= this.length); > {code} > We are trying to come out if the above condition is not satisfied. But if > SSH.splitLog() gets this problem then it lands in the run method of > EventHandler. This kills the SSH thread and so further assignment does not > happen. If ROOT and META are to be assigned they cannot be. > I think in this condition we abort the master by catching such exceptions. > Please do suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5120) Timeout monitor races with table disable handler
[ https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5120: - Assignee: ramkrishna.s.vasudevan > Timeout monitor races with table disable handler > > > Key: HBASE-5120 > URL: https://issues.apache.org/jira/browse/HBASE-5120 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Zhihong Yu >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 0.94.0, 0.92.1 > > Attachments: HBASE-5120.patch, HBASE-5120_1.patch, > HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch > > > Here is what J-D described here: > https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176 > I think I will retract from my statement that it "used to be extremely racy > and caused more troubles than it fixed", on my first test I got a stuck > region in transition instead of being able to recover. The timeout was set to > 2 minutes to be sure I hit it. > First the region gets closed > {quote} > 2012-01-04 00:16:25,811 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > sv4r5s38,62023,1325635980913 for region > test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > {quote} > 2 minutes later it times out: > {quote} > 2012-01-04 00:18:30,026 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > state=PENDING_CLOSE, ts=1325636185810, server=null > 2012-01-04 00:18:30,026 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been > PENDING_CLOSE for too long, running forced unassign again on > region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > 2012-01-04 00:18:30,027 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > (offlining) > {quote} > 100ms later the master finally gets the event: > {quote} > 2012-01-04 00:18:30,129 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, > region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late > 2012-01-04 00:18:30,129 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for 1a4b111bcc228043e89f59c4c3f6a791 > 2012-01-04 00:18:30,129 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so > deleting ZK node and removing from regions in transition, skipping assignment > of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db03587 Deleting existing unassigned node for > 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED > 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db03587 Successfully deleted unassigned node for > region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED > {quote} > At this point everything is fine, the region was processed as closed. But > wait, remember that line where it said it was going to force an unassign? > {quote} > 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db03587 Creating unassigned node for > 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state > 2012-01-04 00:18:30,328 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Server null returned > java.lang.NullPointerException: Passed server is null for > 1a4b111bcc228043e89f59c4c3f6a791 > {quote} > Now the master is confused, it recreated the RIT znode but the region doesn't > even exist anymore. It even tries to shut it down but is blocked by NPEs. Now > this is what's going on. > The late ZK notification that the znode was deleted (but it got recreated > after): > {quote} > 2012-01-04 00:19:33,285 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been > deleted. > {quote} > Then it prints this, and much later tries to unassign it again: > {quote} > 2012-01-04 00:19:46,607 DEBUG > org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on region > to clear regions in transition; > test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. > state=PENDING_CLOSE, ts=1325636310328, server=null > ... > 2012-01-04 00:20:39,623 DEBUG > org.apache.hadoop.hbase.master.handler.DeleteTable
[jira] [Assigned] (HBASE-4988) MetaServer crash cause all splitting regionserver abort
[ https://issues.apache.org/jira/browse/HBASE-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4988: - Assignee: chunhui shen > MetaServer crash cause all splitting regionserver abort > --- > > Key: HBASE-4988 > URL: https://issues.apache.org/jira/browse/HBASE-4988 > Project: HBase > Issue Type: Bug >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: hbase-4988v1.patch > > > If metaserver crash now, > All the splitting regionserver will abort theirself. > Becasue the code > {code} > this.journal.add(JournalEntry.PONR); > MetaEditor.offlineParentInMeta(server.getCatalogTracker(), > this.parent.getRegionInfo(), a.getRegionInfo(), > b.getRegionInfo()); > {code} > If the JournalEntry is PONR, split's roll back will abort itselef. > It is terrible in huge putting environment when metaserver crash -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5121) MajorCompaction may affect scan's correctness
[ https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5121: - Assignee: chunhui shen > MajorCompaction may affect scan's correctness > - > > Key: HBASE-5121 > URL: https://issues.apache.org/jira/browse/HBASE-5121 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: hbase-5121.patch > > > In our test, there are two families' keyvalue for one row. > But we could find a infrequent problem when doing scan's next if > majorCompaction happens concurrently. > In the client's two continuous doing scan.next(): > 1.First time, scan's next returns the result where family A is null. > 2.Second time, scan's next returns the result where family B is null. > The two next()'s result have the same row. > If there are more families, I think the scenario will be more strange... > We find the reason is that storescanner.peek() is changed after > majorCompaction if there are delete type KeyValue. > This change causes the PriorityQueue of RegionScanner's heap > is not sure to be sorted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE
[ https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5097: - Assignee: ramkrishna.s.vasudevan > RegionObserver implementation whose preScannerOpen and postScannerOpen Impl > return null can stall the system initialization through NPE > --- > > Key: HBASE-5097 > URL: https://issues.apache.org/jira/browse/HBASE-5097 > Project: HBase > Issue Type: Bug > Components: coprocessors >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > In HRegionServer.java openScanner() > {code} > r.prepareScanner(scan); > RegionScanner s = null; > if (r.getCoprocessorHost() != null) { > s = r.getCoprocessorHost().preScannerOpen(scan); > } > if (s == null) { > s = r.getScanner(scan); > } > if (r.getCoprocessorHost() != null) { > s = r.getCoprocessorHost().postScannerOpen(scan, s); > } > {code} > If we dont have implemention for postScannerOpen the RegionScanner is null > and so throwing nullpointer > {code} > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326) > {code} > Making this defect as blocker.. Pls feel free to change the priority if am > wrong. Also correct me if my way of trying out coprocessors without > implementing postScannerOpen is wrong. Am just a learner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5086) Reopening a region on a RS can leave it in PENDING_OPEN
[ https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-5086: - Assignee: ramkrishna.s.vasudevan > Reopening a region on a RS can leave it in PENDING_OPEN > --- > > Key: HBASE-5086 > URL: https://issues.apache.org/jira/browse/HBASE-5086 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.1 > > > I got this twice during the same test. > If the region servers are slow enough and you run an online alter, it's > possible for the RS to change the znode status to CLOSED and have the master > send an OPEN before the region server is able to remove the region from it's > list of RITs. > This is what the master sees: > {quote} > 011-12-21 22:24:09,498 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > (offlining) > 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db033f7 Creating unassigned node for > 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state > 2011-12-21 22:24:09,524 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > sv4r25s44,62023,1324494325099 for region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, > region=43123e2e3fc83ec25fe2a76b4f09077f > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for 43123e2e3fc83ec25fe2a76b4f09077f > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for > 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination > server is + sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; > plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., > src=, dest=sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to > sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,664 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: > sv4r25s44,62023,1324494325099 due to > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: > Received:OPEN for the > region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which > we are already trying to CLOSE. > {quote} > After that the master abandons. > And the region server: > {quote} > 2011-12-21 22:24:09,523 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,523 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: > disabling compactions & flushes > 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Running close preflush of > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current > region memstore size 40.5m > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing > wait for mvcc, flushsize=42482936 > 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Renaming flushed file at > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
[jira] [Assigned] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4951: - Assignee: ramkrishna.s.vasudevan > master process can not be stopped when it is initializing > - > > Key: HBASE-4951 > URL: https://issues.apache.org/jira/browse/HBASE-4951 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.3 >Reporter: xufeng >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 0.92.0, 0.90.5 > > > It is easy to reproduce by following step: > step1:start master process.(do not start regionserver process in the cluster). > the master will wait the regionserver to check in: > org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to > checkin > step2:stop the master by sh command bin/hbase master stop > result:the master process will never die because catalogTracker.waitForRoot() > method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4729: - Assignee: stack (was: ramkrishna.s.vasudevan) Assigning to your name Stack. > Clash between region unassign and splitting kills the master > > > Key: HBASE-4729 > URL: https://issues.apache.org/jira/browse/HBASE-4729 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans >Assignee: stack >Priority: Critical > Fix For: 0.92.0, 0.94.0 > > Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, > 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt > > > I was running an online alter while regions were splitting, and suddenly the > master died and left my table half-altered (haven't restarted the master yet). > What killed the master: > {quote} > 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: > Unexpected ZK exception creating node CLOSING > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:110) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) > at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {quote} > A znode was created because the region server was splitting the region 4 > seconds before: > {quote} > 2011-11-02 17:06:40,704 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of > region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. > 2011-11-02 17:06:40,704 DEBUG > org.apache.hadoop.hbase.regionserver.SplitTransaction: > regionserver:62023-0x132f043bbde0710 Creating ephemeral node for > f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state > 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Attempting to transition node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLITTING > ... > 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Successfully transitioned node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLIT > 2011-11-02 17:06:44,061 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for f7e1783e65ea8d621a4bc96ad310f101 > {quote} > Now that the master is dead the region server is spewing those last two lines > like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4880: - Assignee: chunhui shen > Region is on service before completing openRegionHanlder, may cause data loss > - > > Key: HBASE-4880 > URL: https://issues.apache.org/jira/browse/HBASE-4880 > Project: HBase > Issue Type: Bug >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: hbase-4880.patch > > > OpenRegionHandler in regionserver is processed as the following steps: > {code} > 1.openregion()(Through it, closed = false, closing = false) > 2.addToOnlineRegions(region) > 3.update .meta. table > 4.update ZK's node state to RS_ZK_REGION_OPEND > {code} > We can find that region is on service before Step 4. > It means client could put data to this region after step 3. > What will happen if step 4 is failed processing? > It will execute OpenRegionHandler#cleanupFailedOpen which will do closing > region, and master assign this region to another regionserver. > If closing region is failed, the data which is put between step 3 and step 4 > may loss, because the region has been opend on another regionserver and be > put new data. Therefore, it may not be recoverd through replayRecoveredEdit() > because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.
[ https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4841: - Assignee: ramkrishna.s.vasudevan > If I call split fast enough, while inserting, rows disappear. > -- > > Key: HBASE-4841 > URL: https://issues.apache.org/jira/browse/HBASE-4841 > Project: HBase > Issue Type: Bug >Reporter: Alex Newman >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Attachments: 1, log, log2 > > > I'll attach a unit test for this. Basically if you call split, while > inserting data you can get to the point to where the cluster becomes > unstable, or rows will disappear. The unit test gives you some flexibility > of: > - How many rows > - How wide the rows are > - The frequency of the split. > The default settings crash unit tests or cause the unit tests to fail on my > laptop. On my macbook air, i could actually turn down the number of total > rows, and the frequency of the splits which is surprising. I think this is > because the macbook air has much better IO than my backup acer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4308: - Assignee: ramkrishna.s.vasudevan > Race between RegionOpenedHandler and AssignmentManager > -- > > Key: HBASE-4308 > URL: https://issues.apache.org/jira/browse/HBASE-4308 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Todd Lipcon >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0 > > > When the master is processing a ZK event for REGION_OPENED, it calls delete() > on the znode before it removes the node from RegionsInTransition. If the > notification of that delete comes back into AssignmentManager before the > region is removed from RIT, you see an error like: > 2011-08-30 17:43:29,537 WARN [main-EventThread] > master.AssignmentManager(861): Node deleted but still in RIT: > .META.,,1.1028785192 state=OPEN, ts=1314751409532, > server=todd-w510,55655,1314751396840 > Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master
[ https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4796: - Assignee: ramkrishna.s.vasudevan > Race between SplitRegionHandlers for the same region kills the master > - > > Key: HBASE-4796 > URL: https://issues.apache.org/jira/browse/HBASE-4796 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0, 0.94.0 > > > I just saw that multiple SplitRegionHandlers can be created for the same > region because of the RS tickling, but it becomes deadly when more than 1 are > trying to delete the znode at the same time: > {quote} > 2011-11-16 02:25:28,778 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, > region=f80b6a904048a99ce88d61420b8906d1 > 2011-11-16 02:25:28,780 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, > region=f80b6a904048a99ce88d61420b8906d1 > 2011-11-16 02:25:28,796 DEBUG > org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT > event for f80b6a904048a99ce88d61420b8906d1; deleting node > 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x132f043bbde094b Deleting existing unassigned node for > f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT > 2011-11-16 02:25:28,804 DEBUG > org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT > event for f80b6a904048a99ce88d61420b8906d1; deleting node > 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x132f043bbde094b Deleting existing unassigned node for > f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT > 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x132f043bbde094b Successfully deleted unassigned node for > region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT > 2011-11-16 02:25:28,821 INFO > org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT > report); > parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. > daughter > a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter > b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839. > 2011-11-16 02:25:28,829 WARN > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node > /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this > is not a retry > 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error > deleting SPLIT node in ZK for transition ZK node > (f80b6a904048a99ce88d61420b8906d1) > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 > at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453) > at > org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {quote} > Stack and I came up with the solution that we need just manage that exception > because handleSplitReport is an in-memory thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4729) Race between online altering and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4729: - Assignee: ramkrishna.s.vasudevan > Race between online altering and splitting kills the master > --- > > Key: HBASE-4729 > URL: https://issues.apache.org/jira/browse/HBASE-4729 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Jean-Daniel Cryans >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0, 0.94.0 > > > I was running an online alter while regions were splitting, and suddenly the > master died and left my table half-altered (haven't restarted the master yet). > What killed the master: > {quote} > 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: > Unexpected ZK exception creating node CLOSING > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:110) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) > at > org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) > at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {quote} > A znode was created because the region server was splitting the region 4 > seconds before: > {quote} > 2011-11-02 17:06:40,704 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of > region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. > 2011-11-02 17:06:40,704 DEBUG > org.apache.hadoop.hbase.regionserver.SplitTransaction: > regionserver:62023-0x132f043bbde0710 Creating ephemeral node for > f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state > 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Attempting to transition node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLITTING > ... > 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x132f043bbde0710 Successfully transitioned node > f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to > RS_ZK_REGION_SPLIT > 2011-11-02 17:06:44,061 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for f7e1783e65ea8d621a4bc96ad310f101 > {quote} > Now that the master is dead the region server is spewing those last two lines > like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4462) Properly treating SocketTimeoutException
[ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4462: - Assignee: ramkrishna.s.vasudevan > Properly treating SocketTimeoutException > > > Key: HBASE-4462 > URL: https://issues.apache.org/jira/browse/HBASE-4462 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.90.4 >Reporter: Jean-Daniel Cryans >Assignee: ramkrishna.s.vasudevan > Fix For: 0.90.5 > > > SocketTimeoutException is currently treated like any IOE inside of > HCM.getRegionServerWithRetries and I think this is a problem. This method > should only do retries in cases where we are pretty sure the operation will > complete, but with STE we already waited for (by default) 60 seconds and > nothing happened. > I found this while debugging Douglas Campbell's problem on the mailing list > where it seemed like he was using the same scanner from multiple threads, but > actually it was just the same client doing retries while the first run didn't > even finish yet (that's another problem). You could see the first scanner, > then up to two other handlers waiting for it to finish in order to run > (because of the synchronization on RegionScanner). > So what should we do? We could treat STE as a DoNotRetryException and let the > client deal with it, or we could retry only once. > There's also the option of having a different behavior for get/put/icv/scan, > the issue with operations that modify a cell is that you don't know if the > operation completed or not (same when a RS dies hard after completing let's > say a Put but just before returning to the client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes
[ https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4459: - Assignee: ramkrishna.s.vasudevan > HbaseObjectWritable code is a byte, we will eventually run out of codes > --- > > Key: HBASE-4459 > URL: https://issues.apache.org/jira/browse/HBASE-4459 > Project: HBase > Issue Type: Bug > Components: io >Reporter: Jonathan Gray >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 0.92.0 > > > There are about 90 classes/codes in HbaseObjectWritable currently and > Byte.MAX_VALUE is 127. In addition, anyone wanting to add custom classes but > not break compatibility might want to leave a gap before using codes and > that's difficult in such limited space. > Eventually we should get rid of this pattern that makes compatibility > difficult (better client/server protocol handshake) but we should probably at > least bump this to a short for 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira