On Mon, Aug 30, 2010 at 1:40 PM, Jean-Daniel Cryans <[email protected]> wrote:
> Vidhya,
>
> Thanks for keeping it up :)
>
> If you trace all the references to 0000003404803994 in the region
> server, what do you see? I think that most of the time is spent
> opening the region sequentially, it'd be nice to confirm.
>
> J-D
>
> On Mon, Aug 30, 2010 at 10:20 AM, Vidhyashankar Venkataraman
> <[email protected]> wrote:
>> Thank you for the responses guys.. I will be looking at the problem later 
>> this week, I guess. The initial eyeballing suggested that the RS's respond 
>> to the master's open very quickly.. And I saw our rrd stats (which is 
>> similar to ganglia) and I couldn't see any obvious IO/CPU/nw bottlenecks: 
>> which means I haven't explored enough :) I will be coming back to the 
>> problem soon and will let you know about my findings..
>>
>>  I have been running scans and planning to run incremental loads on the db 
>> and get some useful numbers.. Scans are blazing fast with just one storefile 
>> per region: 75-95 MBps per node! (this is if locality between RS and data is 
>> kind of preserved: i.e., after a major compact is issued)..
>>
>> I will update on my status soon..
>> Thank you again
>> Vidhya
>>
>> On 8/27/10 5:42 PM, "Jonathan Gray" <[email protected]> wrote:
>>
>> Vidhya,
>>
>> Could you post a snippet of an RS log during this time?  You should be able 
>> to see what's happening between when the OPEN message gets there and the 
>> OPEN completes.
>>
>> Like Stack said, it's probably that its single-threaded in the version 
>> you're using and with all the file opening, your NN and DNs are under heavy 
>> load.  Do you see io-wait or anything else jump up across the cluster at 
>> this time?  You have ganglia setup on this bad boy?
>>
>> JG
>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of
>>> Stack
>>> Sent: Friday, August 27, 2010 5:36 PM
>>> To: [email protected]
>>> Subject: Re: Initial region loads in hbase..
>>>
>>> In 0.20, open on a regionserver is single-threaded.  Could that be it?
>>>  The server has lots of regions to open and its taking time?  Is the
>>> meta table being beat up?  Could this be holding up region opens?
>>>
>>> Good luck V,
>>> St.Ack
>>>
>>>
>>> On Fri, Aug 27, 2010 at 5:01 PM, Vidhyashankar Venkataraman
>>> <[email protected]> wrote:
>>> > Hi guys,
>>> >  A couple of days back, I had posted a problem on regions taking too
>>> much time to load when I restart Hbase.. I have a table that has around
>>> 80 K regions on 650 nodes (!) ..
>>> >  I was checking the logs in the master and I notice that the time it
>>> takes from assigning a region to a region server to the point when it
>>> recognizes that the region is open in that server takes around 20-30
>>> minutes!
>>> >   Apart from master being the bottleneck here, can you guys let me
>>> know what the other possible cases are as to why this may happen?
>>> >
>>> > Thank you
>>> > Vidhya
>>> >
>>> > Below is an example for region with start key 0000003404803994 where
>>> the assignment takes place at 22:59 while the confirmation that it got
>>> open came at 23:19...
>>> >
>>> > 2010-08-27 22:59:02,642 DEBUG
>>> org.apache.hadoop.hbase.master.BaseScanner: Current assignment of
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> is not valid;  serverAddress=, startCode=0 unknown.
>>> > 2010-08-27 22:59:02,643 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Creating UNASSIGNED
>>> region 73c0f8fdb8ffbc20b9a239d325932ff1 in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,645 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeCreated with state SyncConnected with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeCreated with path
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeCreated state:SyncConnected
>>> path:/hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,645 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
>>> <b3130520.yst.yahoo.net,b3130560.yst.yahoo.net,b3130600.yst.yahoo.net,b
>>> 3130640.yst.yahoo.net,b3130680.yst.yahoo.net:/hbase,org.apache.hadoop.h
>>> base.master.HMaster>Created ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created/updated
>>> UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: Got event type [
>>> M2ZK_REGION_OFFLINE ] for region 73c0f8fdb8ffbc20b9a239d325932ff1
>>> > 2010-08-27 22:59:02,646 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>> Event NodeChildrenChanged with state SyncConnected with path
>>> /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event
>>> NodeChildrenChanged with path /hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,646 DEBUG
>>> org.apache.hadoop.hbase.master.ZKUnassignedWatcher: ZK-EVENT-PROCESS:
>>> Got zkEvent NodeChildrenChanged state:SyncConnected
>>> path:/hbase/UNASSIGNED
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning for
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993): total
>>> nregions to assign=1, regions to give other servers than this=0,
>>> isMetaAssign=false
>>> > 2010-08-27 22:59:02,647 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning
>>> serverName=b3130020.yst.yahoo.net,60020,1282940735627,
>>> load=(requests=0, regions=76, usedHeap=80, maxHeap=7993) 1 regions
>>> > 2010-08-27 22:59:02,647 INFO
>>> org.apache.hadoop.hbase.master.RegionManager: Assigning region
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> to b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: No need to update
>>> UNASSIGNED region 73c0f8fdb8ffbc20b9a239d325932ff1 as it already exists
>>> in state = M2ZK_REGION_OFFLINE
>>> > 2010-08-27 22:59:02,653 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Created UNASSIGNED zNode
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in state M2ZK_REGION_OFFLINE
>>> >
>>> >
>>> >
>>> >
>>> > AND THEN,
>>> >
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation:
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> open on b3130020.yst.yahoo.net,60020,1282940735627
>>> > 2010-08-27 23:18:53,591 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
>>> DocDB,0000003404803994,1282947439133.73c0f8fdb8ffbc20b9a239d325932ff1.
>>> in region .META.,,1 with startcode=1282940735627,
>>> server=b3130020.yst.yahoo.net:60020
>>> > 2010-08-27 23:18:53,677 DEBUG
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Deleting ZNode
>>> /hbase/UNASSIGNED/73c0f8fdb8ffbc20b9a239d325932ff1 in ZooKeeper as
>>> region is open...
>>> >
>>> >
>>> >
>>
>>
>

Just an FYI. hadoop-cacti-jtg
(http://www.jointhegrid.com/hadoop/index.jsp) had excellent graph
coverage for all the jmx attributes in hbase-20.3.

http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_region_blockcache.jsp
http://www.jointhegrid.com/hadoop-cacti-jtg-walk/hbase_master.jsp

Has their been any major changes/additions to hbase jmx recently?

Edward

Reply via email to