Vidhya, Thanks for keeping it up :)
If you trace all the references to 0000003404803994 in the region server, what do you see? I think that most of the time is spent opening the region sequentially, it'd be nice to confirm. J-D On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman <[email protected]> wrote: > Thank you for the responses guys.. I will be looking at the problem later > this week, I guess. The initial eyeballing suggested that the RS's respond to > the master's open very quickly.. And I saw our rrd stats (which is similar to > ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I > haven't explored enough :) I will be coming back to the problem soon and will > let you know about my findings.. > > I have been running scans and planning to run incremental loads on the db > and get some useful numbers.. Scans are blazing fast with just one storefile > per region: 75-95 MBps per node! (this is if locality between RS and data is > kind of preserved: i.e., after a major compact is issued).. > > I will update on my status soon.. > Thank you again > Vidhya > > On 8/27/10 5:42 PM, "Jonathan Gray" <[email protected]> wrote: > > Vidhya, > > Could you post a snippet of an RS log during this time? You should be able > to see what's happening between when the OPEN message gets there and the OPEN > completes. > > Like Stack said, it's probably that its single-threaded in the version you're > using and with all the file opening, your NN and DNs are under heavy load. > Do you see io-wait or anything else jump up across the cluster at this time? > You have ganglia setup on this bad boy? > > JG > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of >> Stack >> Sent: Friday, August 27, 2010 5:36 PM >> To: [email protected] >> Subject: Re: Initial region loads in hbase.. >> >> In 0.20, open on a regionserver is single-threaded. Could that be it? >> The server has lots of regions to open and its taking time? Is the >> meta table being beat up? Could this be holding up region opens? >> >> Good luck V, >> St.Ack >> >> >> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman >> <[email protected]> wrote: >> > Hi guys, >> > A couple of days back, I had posted a problem on regions taking too >> much time to load when I restart Hbase.. I have a table that has around >> 80 K regions on 650 nodes (!) .. >> > I was checking the logs in the master and I notice that the time it >> takes from assigning a region to a region server to the point when it >> recognizes that the region is open in that server takes around 20-30 >> minutes! >> > Apart from master being the bottleneck here, can you guys let me >> know what the other possible cases are as to why this may happen? >> > >> > Thank you >> > Vidhya >> > >> > Below is an example for region with start key 0000003404803994 where >> the assignment takes place at 22:59 while the confirmation that it got >> open came at 23:19... >> > >> > 2010-08-27 22:59:02,642 DEBUG >> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> is not valid; serverAddress=, startCode=0 unknown. >> > 2010-08-27 22:59:02,643 DEBUG >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED >> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE >> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster: >> Event NodeCreated with state SyncConnected with path >> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 >> > 2010-08-27 22:59:02,645 DEBUG >> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event >> NodeCreated with path >> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 >> > 2010-08-27 22:59:02,645 DEBUG >> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: >> Got zkEvent NodeCreated state:SyncConnected >> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 >> > 2010-08-27 22:59:02,645 DEBUG >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: >> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b >> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h >> base.master.HMaster>Created ZNode >> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper >> > 2010-08-27 22:59:02,646 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Created/updated >> UNASSIGNED zNode >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> in state M2ZK_REGION_OFFLINE >> > 2010-08-27 22:59:02,646 DEBUG >> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [ >> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1 >> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster: >> Event NodeChildrenChanged with state SyncConnected with path >> /hbase/UNASSIGNED >> > 2010-08-27 22:59:02,646 DEBUG >> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event >> NodeChildrenChanged with path /hbase/UNASSIGNED >> > 2010-08-27 22:59:02,646 DEBUG >> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS: >> Got zkEvent NodeChildrenChanged state:SyncConnected >> path:/hbase/UNASSIGNED >> > 2010-08-27 22:59:02,647 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Assigning for >> serverName=b3130020.yst.yahoo.net,60020,1282940735627, >> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total >> nregions to assign=1, regions to give other servers than this=0, >> isMetaAssign=false >> > 2010-08-27 22:59:02,647 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Assigning >> serverName=b3130020.yst.yahoo.net,60020,1282940735627, >> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions >> > 2010-08-27 22:59:02,647 INFO >> org.apache.hadoop.hbase.master.RegionManager: Assigning region >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> to b3130020.yst.yahoo.net,60020,1282940735627 >> > 2010-08-27 22:59:02,653 DEBUG >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update >> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists >> in state = M2ZK_REGION_OFFLINE >> > 2010-08-27 22:59:02,653 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> in state M2ZK_REGION_OFFLINE >> > >> > >> > >> > >> > AND THEN, >> > >> > 2010-08-27 23:18:53,591 INFO >> org.apache.hadoop.hbase.master.RegionServerOperation: >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> open on b3130020.yst.yahoo.net,60020,1282940735627 >> > 2010-08-27 23:18:53,591 INFO >> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row >> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1. >> in region .META.,,1 with startcode=1282940735627, >> server=b3130020.yst.yahoo.net:60020 >> > 2010-08-27 23:18:53,677 DEBUG >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode >> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as >> region is open... >> > >> > >> > > >
