Two concurrent assign would cause master to abort with msg Unexpected state
trying to OFFLINE;
-
Key: HBASE-5816
URL: https://issues.apache.org/jira/browse/HBASE-5816
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.90.6
Reporter: Maryann Xue
The first assign thread exits with success after updating the RegionState to
PENDING_OPEN, while the second assign follows immediately into assign and
fails the RegionState check in setOfflineInZooKeeper(). This causes the master
to abort.
In the below case, the two concurrent assigns occurred when AM tried to assign
a region to a dying/dead RS, and meanwhile the ShutdownServerHandler tried to
assign this region (from the region plan) spontaneously.
2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance
hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
src=hadoop05.sh.intel.com,60020,1334544902186,
dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Starting unassignment of region
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
(offlining)
2012-04-17 05:44:57,648 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Sent CLOSE to serverName=hadoop05.sh.intel.com,60020,1334544902186,
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
2012-04-17 05:44:57,666 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling new unassigned node:
/hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b
(region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Forcing OFFLINE;
was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
state=CLOSED, ts=1334612697672, server=hadoop05.sh.intel.com,60020,1334544902186
2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:6-0x236b912e9b3000e Creating (or updating) unassigned node for
fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Using pre-existing plan for region
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.;
plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
src=hadoop05.sh.intel.com,60020,1334544902186,
dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:52:59,096 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to
xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:54:19,159 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Forcing OFFLINE;
was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
state=PENDING_OPEN, ts=1334613179096,
server=xmlqa-clv16.sh.intel.com,60020,1334612497253
2012-04-17 05:54:59,033 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of
TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to
serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0,
regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket
timeout exception: java.net.SocketTimeoutException: 12 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302
remote=/10.239.47.87:60020]
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
at $Proxy7.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at