First, I saw:

2010-09-21 11:30:05,122 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
2010-09-21 11:30:05,122 DEBUG
org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
2010-09-21 11:30:05,122 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
of server 10.103.2.5,60020,1285042335711: logSplit: false,
rootRescanned: false, n
umberOfMetaRegions: 1, onlineMetaRegions.size(): 0

repeated rapidly for 20 mins or so.

Then:

Bunch of regions got unassigned:


2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
from 10.103.2.3,60020,1285042333293
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img816,img2103r.jpg,1285003791610.1592893332
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img534,92166039.jpg,1284949117852.1009352950
2010-09-21 12:00:07,782 DEBUG
org.apache.hadoop.hbase.master.RegionManager: Going to close region
img36,abcwu.jpg,1285001278990.272235177


Restarting master did not help.  Ultimately what brought the cluster
back up, is full shutdown of regionservers, and masters, and then
bring all up.

Any ideas what might have happened here?

We are running:

HBase Version   0.89.20100726, r979826
Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
Regions On FS   5057

3 zookeepers and 13 regionservers.

-Jack

Reply via email to