Vidhya,

Thanks for keeping it up :)

If you trace all the references to 0000003404803994 in the region
server, what do you see? I think that most of the time is spent
opening the region sequentially, it'd be nice to confirm.

J-D

On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman
<[email protected]> wrote:
> Thank you for the responses guys.. I will be looking at the problem later 
> this week, I guess. The initial eyeballing suggested that the RS's respond to 
> the master's open very quickly.. And I saw our rrd stats (which is similar to 
> ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: which means I 
> haven't explored enough :) I will be coming back to the problem soon and will 
> let you know about my findings..
>
>  I have been running scans and planning to run incremental loads on the db 
> and get some useful numbers.. Scans are blazing fast with just one storefile 
> per region: 75-95 MBps per node! (this is if locality between RS and data is 
> kind of preserved: i.e., after a major compact is issued)..
>
> I will update on my status soon..
> Thank you again
> Vidhya
>
> On 8/27/10 5:42 PM, "Jonathan Gray" <[email protected]> wrote:
>
> Vidhya,
>
> Could you post a snippet of an RS log during this time?  You should be able 
> to see what's happening between when the OPEN message gets there and the OPEN 
> completes.
>
> Like Stack said, it's probably that its single-threaded in the version you're 
> using and with all the file opening, your NN and DNs are under heavy load.  
> Do you see io-wait or anything else jump up across the cluster at this time?  
> You have ganglia setup on this bad boy?
>
> JG
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Stack
>> Sent: Friday, August 27, 2010 5:36 PM
>> To: [email protected]
>> Subject: Re: Initial region loads in hbase..
>>
>> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>>  The server has lots of regions to open and its taking time?  Is the
>> meta table being beat up?  Could this be holding up region opens?
>>
>> Good luck V,
>> St.Ack
>>
>>
>> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
>> <[email protected]> wrote:
>> > Hi guys,
>> >  A couple of days back, I had posted a problem on regions taking too
>> much time to load when I restart Hbase.. I have a table that has around
>> 80 K regions on 650 nodes (!) ..
>> >  I was checking the logs in the master and I notice that the time it
>> takes from assigning a region to a region server to the point when it
>> recognizes that the region is open in that server takes around 20-30
>> minutes!
>> >   Apart from master being the bottleneck here, can you guys let me
>> know what the other possible cases are as to why this may happen?
>> >
>> > Thank you
>> > Vidhya
>> >
>> > Below is an example for region with start key 0000003404803994 where
>> the assignment takes place at 22:59 while the confirmation that it got
>> open came at 23:19...
>> >
>> > 2010-08-27 22:59:02,642 DEBUG
>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> is not valid;  serverAddress=, startCode=0 unknown.
>> > 2010-08-27 22:59:02,643 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
>> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeCreated with state SyncConnected with path
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeCreated with path
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeCreated state:SyncConnected
>> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,645 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
>> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
>> base.master.HMaster>Created ZNode
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>> UNASSIGNED zNode
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in state M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
>> Event NodeChildrenChanged with state SyncConnected with path
>> /hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>> NodeChildrenChanged with path /hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,646 DEBUG
>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>> Got zkEvent NodeChildrenChanged state:SyncConnected
>> path:/hbase/UNASSIGNED
>> > 2010-08-27 22:59:02,647 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Assigning for
>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
>> nregions to assign=1, regions to give other servers than this=0,
>> isMetaAssign=false
>> > 2010-08-27 22:59:02,647 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Assigning
>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>> > 2010-08-27 22:59:02,647 INFO
>> org.apache.hadoop.hbase.master.RegionManager: Assigning region
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> to b3130020.yst.yahoo.net,60020,1282940735627
>> > 2010-08-27 22:59:02,653 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
>> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
>> in state = M2ZK_REGION_OFFLINE
>> > 2010-08-27 22:59:02,653 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in state M2ZK_REGION_OFFLINE
>> >
>> >
>> >
>> >
>> > AND THEN,
>> >
>> > 2010-08-27 23:18:53,591 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation:
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> open on b3130020.yst.yahoo.net,60020,1282940735627
>> > 2010-08-27 23:18:53,591 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>> in region .META.,,1 with startcode=1282940735627,
>> server=b3130020.yst.yahoo.net:60020
>> > 2010-08-27 23:18:53,677 DEBUG
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
>> region is open...
>> >
>> >
>> >
>
>

Reply via email to