St.Ack you are always helping us!
Thank you very much!!!
The cluster has an NFS where the default directory of all users is saved. (when
I log in my working directory is in the NFS)
I have Hadoop and HBase in the local filesystem of each node. However, is there
any possibility that HBase uses the NFS?
Should I use any other parameters than those below?
hbase-site.xml is the following:
<property>
<name>hbase.rootdir</name>
<value>hdfs://clone11:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>clone11</value>
<description>A
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>A
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/local/panton/hadoop-0.20.2-cdh3u0/dfs/zoo</value>
<description>A
</description>
</property>
>> You mean heap used?
Yes you are totally right. I got mixed up.
>>Are all maps in flight when some complete in 2 minutes?
Yes. Always 48 maps are running.
>>What is happening with i/o as we go from 2-15 minutes? Is it going up as
>>time progresses? What about the network? Does iowait go up as job
>>progresses?
How will I monitor the i/o, and the network? Can you please tell me a tool? I
am not the admin of the cluster so maybe I will have to ask the admins to
instal it.
i/o wait (using top command) is generally steady and around 0-15% even when the
tasks take much longer, going up only for a couple of moments.
Idle percentage is always really high.
>>What is the map doing? A get only? Or is it also populating the cluster so
>>more data in the system when maps are taking longer to complete.
They do a GET to load a string from HBase and compare it with a string that
comes as input.
When I use an empty table with the GET, The cpu usage is really high, i/o wait
is low, the map tasks are completed much faster and the time is steady for all
map tasks.
On the other hand, if I keep the GET with the normal table (which has many
rows) and remove all context.write() commands the problem remains. However the
problems gets a bit smaller as the first tasks need 2-3 mins and the next need
about 6-7 mins.
This is why I believe it has to do with HBase and the GET. Do you think this is
a correct assumption?
>> Do you have many regions? Are they evenly distributed, etc.
Yes I always take care to split the table and have the regions evenly
distributed.
> Date: Wed, 7 Sep 2011 08:32:29 -0700
> Subject: Re: HBase slowdown while running MR job with GET
> From: [email protected]
> To: [email protected]
>
> 2011/9/7 Panagiotis Antonopoulos <[email protected]>:
> > Although the map tasks which run first complete fast (in 2 minutes for
> > example) then the next map tasks need much more time to complete (4mins)
> > and even later the following map tasks need more that 15 mins to complete.
> >
>
> Are all maps in flight when some complete in 2 minutes? What is
> happening with i/o as we go from 2-15 minutes? Is it going up as time
> progresses? What about the network? What is the map doing? A get
> only? Or is it also populating the cluster so more data in the
> system when maps are taking longer to complete. Do you have many
> regions? Are they evenly distributed, etc.
>
> > It seems like HBase overloads and cannot respond fast enough.
> >
> > While the MR job is running I have noticed the following:
> >
> > 1) The cpu usage of the map tasks is high at the beginning and then goes
> > down to 4-5%. I think that this means that the results of the GET command
> > take long to be returned.
> >
>
> This could be. Does iowait go up as job progresses?
>
> > 2) The used stack of the RegionServers (as shown in the web GUI) increases
> > and it doesn't decrease even when the job is completed.
> >
>
> You mean heap used? Yeah, thats general tendency of java apps. There
> is no 'shrink' of the allocated heap when done facility.
>
>
> > 3) Using the "top" command, I see that the memory used by the regionserver
> > increases up to the stack limit I have selected (2GB) and it doesn't go
> > down even when the job is completed.
> >
>
> See above.
> St.Ack