On Fri, May 13, 2011 at 3:01 PM, Raghu Angadi <[email protected]> wrote: > Thanks Stack. greatly appreciate the help. >
No problem. > hbase.regionserver.handler.count is set to 30. > we have not set hbase.master.assignment.timeoutmonitor.timeout. will surely > increase to 180 seconds as HBASE-3846 does. > You might want to go to 0.90.3 altogether. Is that a pain for you? > The load on the cluster is low to moderate and HBase holds up pretty well. > Most of the load consists of hourly random writes to the table and > sequential scans from MR jobs. > Thanks boss. > I will send another email with locations to full master logs. > There are many "Regions in transition timed out" messages for this region > and many others spread over time. > Grand. I can come over any time or you should drop by our place. Its just a few blocks away and we can munch on lunch while we dig in your logs. St.Ack > Raghu. > > On Fri, May 13, 2011 at 11:33 AM, Stack <[email protected]> wrote: > >> I see that we are timing out region assignment then assigning >> elsewhere, but the region opened anyway on first server (What do you >> have hbase.regionserver.handler.count set to? The default is 10 which >> could mean a bunch of requests hanging out in the rpc queue before >> getting into the server to be processed). One thing you could do is >> up your region in transition timeout. Default is 30 seconds which if >> there is a bunch of churn may not be enough time for region assignment >> to complete -- was there churn at this time? (We up the default >> timeout in 0.90.3, see 'HBASE-3846 Set RIT timeout higher'). >> >> See below for more. >> >> On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi <[email protected]> wrote: >> ... >> >> > 2011-05-12 12:05:20,987 DEBUG >> >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened >> >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. >> >> The region opened successfully. >> >> But looking at the master log, 12 seconds earlier it says: >> >> >>>> 2011-05-12 12:05:08,122 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition >> timed out: users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. >> state=3DOPENING, ts=3D1305201871850 2011-05-12 12:05:08,122 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING >> for too long, reassigning >> >> >> .... and then forces it reasssigned elsewhere (Your log from master >> stops at this point. I'd be interested in seeing more. Send it to me >> offline?). >> >> Thanks Raghu, >> St.Ack >> >
