Thank you for the responses guys.. I will be looking at the problem later this 
week, I guess. The initial eyeballing suggested that the RS's respond to the 
master's open very quickly.. And I saw our rrd stats (which is similar to 
ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I 
haven't explored enough :) I will be coming back to the problem soon and will 
let you know about my findings..

 I have been running scans and planning to run incremental loads on the db and 
get some useful numbers.. Scans are blazing fast with just one storefile per 
region: 75-95 MBps per node! (this is if locality between RS and data is kind 
of preserved: i.e., after a major compact is issued)..

I will update on my status soon..
Thank you again
Vidhya

On 8/27/10 5:42 PM, "Jonathan Gray" <[email protected]> wrote:

Vidhya,

Could you post a snippet of an RS log during this time?  You should be able to 
see what's happening between when the OPEN message gets there and the OPEN 
completes.

Like Stack said, it's probably that its single-threaded in the version you're 
using and with all the file opening, your NN and DNs are under heavy load.  Do 
you see io-wait or anything else jump up across the cluster at this time?  You 
have ganglia setup on this bad boy?

JG

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Friday, August 27, 2010 5:36 PM
> To: [email protected]
> Subject: Re: Initial region loads in hbase..
>
> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>  The server has lots of regions to open and its taking time?  Is the
> meta table being beat up?  Could this be holding up region opens?
>
> Good luck V,
> St.Ack
>
>
> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
> <[email protected]> wrote:
> > Hi guys,
> >  A couple of days back, I had posted a problem on regions taking too
> much time to load when I restart Hbase.. I have a table that has around
> 80 K regions on 650 nodes (!) ..
> >  I was checking the logs in the master and I notice that the time it
> takes from assigning a region to a region server to the point when it
> recognizes that the region is open in that server takes around 20-30
> minutes!
> >   Apart from master being the bottleneck here, can you guys let me
> know what the other possible cases are as to why this may happen?
> >
> > Thank you
> > Vidhya
> >
> > Below is an example for region with start key 0000003404803994 where
> the assignment takes place at 22:59 while the confirmation that it got
> open came at 23:19...
> >
> > 2010-08-27 22:59:02,642 DEBUG
> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> is not valid;  serverAddress=, startCode=0 unknown.
> > 2010-08-27 22:59:02,643 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeCreated with state SyncConnected with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeCreated with path
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeCreated state:SyncConnected
> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,645 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
> base.master.HMaster>Created ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created/updated
> UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Event NodeChildrenChanged with state SyncConnected with path
> /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
> NodeChildrenChanged with path /hbase/UNASSIGNED
> > 2010-08-27 22:59:02,646 DEBUG
> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
> Got zkEvent NodeChildrenChanged state:SyncConnected
> path:/hbase/UNASSIGNED
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning for
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
> nregions to assign=1, regions to give other servers than this=0,
> isMetaAssign=false
> > 2010-08-27 22:59:02,647 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Assigning
> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
> > 2010-08-27 22:59:02,647 INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> to b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
> in state = M2ZK_REGION_OFFLINE
> > 2010-08-27 22:59:02,653 DEBUG
> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in state M2ZK_REGION_OFFLINE
> >
> >
> >
> >
> > AND THEN,
> >
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> open on b3130020.yst.yahoo.net,60020,1282940735627
> > 2010-08-27 23:18:53,591 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
> in region .META.,,1 with startcode=1282940735627,
> server=b3130020.yst.yahoo.net:60020
> > 2010-08-27 23:18:53,677 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
> region is open...
> >
> >
> >

Reply via email to