I see that we are timing out region assignment then assigning elsewhere, but the region opened anyway on first server (What do you have hbase.regionserver.handler.count set to? The default is 10 which could mean a bunch of requests hanging out in the rpc queue before getting into the server to be processed). One thing you could do is up your region in transition timeout. Default is 30 seconds which if there is a bunch of churn may not be enough time for region assignment to complete -- was there churn at this time? (We up the default timeout in 0.90.3, see 'HBASE-3846 Set RIT timeout higher').
See below for more. On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <[email protected]> wrote: ... >> > 2011-05-12 12:05:20,987 DEBUG >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. The region opened successfully. But looking at the master log, 12 seconds earlier it says: >>>> 2011-05-12 12:05:08,122 INFO >>>> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition >>>> timed out: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. >>>> state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO >>>> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING >>>> for too long, reassigning .... and then forces it reasssigned elsewhere (Your log from master stops at this point. I'd be interested in seeing more. Send it to me offline?). Thanks Raghu, St.Ack
