Re: HBase not scaling well

Hari Shankar Sun, 31 Oct 2010 11:30:27 -0700

Thanks guys for the replies, and very sorry for the late reply. We are
quite new to linux environment... our production servers are currently
running on windows and our linux sysadmin is yet to arrive. So please
forgive my ignorance regarding linux tools. Very little prior
experience in linux. All our 3 nodes are running on different linux
distros - one on ubuntu server 10.10, one on CentOS and one on
Ubuntu-desktop 10.04. All have the same directory structure and same
versions of hadoop, hbase and java though. Let me know if you think
this could be an issue. Basically we wanted to evaluate all three
distros at the same time as well. I hope than plan didn't backfire.


Back to the problem at hand, here are the iptraf, htop and iostat reports:

iptraf snapshot --master

Total rates:
165424.2 kbits/s
3800 packets/s

Incoming:
109415.3 kbits/s
3007.4 packets/s

iptraf snapshot --slave01

Total rates:
102024 kbits/s
3128 packets/s

Incoming:
48755.9 kbits/s
1784 packets/s

iostat --master

Linux 2.6.32-21-generic (hadoop1)       Sunday 31 October 2010  _x86_64_        
(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                    0.54    0.01      0.18     0.30         0.00   98.97

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.43       123.11       412.93   33988462  114000368

iostat --slave01

Linux 2.6.35-22-server (hadoop2)        Sunday 31 October 2010  _x86_64_        
(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                   0.77    0.00    0.29    0.18    0.00   98.77

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               3.90       277.19      1228.97  245515598 1088525808


iostat --slave02

Linux 2.6.18-194.11.1.el5 (slave)       10/31/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  0.54    0.00             0.29    0.80    0.00   98.37

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               6.57       302.09      1652.06  321914364 1760497088
sda1              0.00         0.00         0.00       2458         88
sda2              6.57       302.08      1652.06  321911602 1760497000
dm-0            209.33       302.08      1652.06  321910322 1760496272
dm-1           <F5>   0.00         0.00         0.00        896        728

htop --master
http://imgur.com/3zTu7

htop --slave01
http://imgur.com/5HeyF

htop --slave02
http://imgur.com/lHin7

I hope these are the reports that you were referring to. Please let me
otherwise. Also, is there an easier command-line way of fetching the
iptraf and htop reports? master is running ubuntu desktop, slave01
runs ubuntu server and slave02 runs CentOS.
Some more facts that I have noticed:
- I ran the job just now on the cluster after reformatting the
namenode and it took only 1 hr 15 mins instead of the usual 2 hrs,
though still slower than the single node config (30-40 mins). Can it
be that it is faster right after a namenode format?
- The time set on one of the slaves was incorrect and it lagged by 4
hrs compared to the other two machines. I corrected the time before
formatting the namenode this time. I wonder if that could have an
impact.
- I have ZK running on all 3 machines. Shouldn't it work fine if I
just set up ZK on one of the nodes. In that case, I get a weird error:
could not connect to port 0::0::0::0::....:2181 or something of that
sort. I'll post the full error next time I see it.
- The CentOS machine (slave02) seems to use a lot more CPU than the
other 2 guys on an average. CPU usage in centos hovers around 50-60%
mostly whereas it is more like 30-40% on the other 2 machines. (ref.
htop screenshots above).
- One a single-node configuration, moving from a 4 GB-RAM dual core
laptop to an 8 GB-quad core machine gives a 1.8x performance
improvement.
- Increasing the child task heap size from the default 200 MB to 768
MB improved performance on both single and multi node clusters by 100%
(2x improvement). But going beyond 768 MB doesn't seem to have much
impact.

Michael and Jonathan, I think I have covered most of the info you guys
had asked for as well above. It doesn't seem to be swapping, and yes,
currently we are running all thoise processes on the master, and all
processes minus namenode, secondary namenode and JT on the slaves. But
we run all those processes on a single machine in case of single node
as well, right? So if RAM/Swap was the culprit, shouldn't it effect
single-node config more?

Do let me know if anything is missing or you think more info would
help. Many thanks for your time and patience. :)

Thanks,
Hari

On Fri, Oct 29, 2010 at 9:51 PM, Jonathan Gray <[email protected]> wrote:
> Going from pseudo-distributed mode to a 3 node setup is definitely not 
> "scaling" in a real way and I would expect performance degradation.  Most 
> especially when you're also running at replication factor 3 and in a setup 
> where the master node is also acting as a slave node and MR task node.
>
> You're adding an entirely new layer (HDFS) which will always cause increased 
> latency/decreased throughput, and then you're running on 3 nodes with a 
> replication factor of 3.  So now every write is going to all three nodes, via 
> HDFS, rather than a single node straight to the FS.
>
> You said that "all parts should ideally be available on all nodes", but this 
> is a write test?  So that's a bad thing not a good thing.
>
> I would expect about a 50% slowdown but you're seeing more like 75% slowdown. 
>  Not so out of the ordinary still.  Stuffing a NN, DN, JT, TT, HMaster, and 
> RS onto a single node is not a great idea.  And then you're running 4 
> simultaneous tasks on a 4 core machine (along with these 6 other processes in 
> the case of the master node).
>
> How many disks do each of your nodes have?
>
> If you really want to "scale" HBase, you're going to need more nodes.  I've 
> seen some success at a 5 node level but generally 10 nodes and up is when 
> HBase does well (and replication 3 makes sense).
>
> JG
>
>
>> -----Original Message-----
>> From: Michael Segel [mailto:[email protected]]
>> Sent: Friday, October 29, 2010 8:03 AM
>> To: [email protected]
>> Subject: RE: HBase not scaling well
>>
>>
>>
>> I'd actually take a step back and ask what Hari is trying to do?
>>
>> Its difficult to figure out what the problem is when the OP says I've
>> got code that works on individual psuedo mode, but not in an actual
>> cluster.
>> It would be nice to know version(s), configuration... 3 nodes... are
>> they running ZK on the same machines that they are running Region
>> Servers... Are they swapping? 8GB of memory can disappear quickly...
>>
>> Lots of questions...
>>
>>
>> > From: [email protected]
>> > To: [email protected]
>> > Date: Fri, 29 Oct 2010 09:05:28 +0100
>> > Subject: Re: HBase not scaling well
>> >
>> > Hi Hari,
>> >
>> > Could you do some realtime monitoring (htop, iptraf, iostat) and
>> report the results? Also you could add some timers to the map-reduce
>> operations: measure average operations times to figure out what's
>> taking so long.
>> >
>> > Cosmin
>> > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote:
>> >
>> > > Hi,
>> > >
>> > >     We are currently doing a POC for HBase in our system. We have
>> > > written a bulk upload job to upload our data from a text file into
>> > > HBase. We are using a 3-node cluster, one master which also works
>> as
>> > > slave (running as namenode, jobtracker, HMaster, datanode,
>> > > tasktracker, HQuorumpeer and  HRegionServer) and 2 slaves
>> (datanode,
>> > > tasktracker, HQuorumpeer and  HRegionServer running). The problem
>> is
>> > > that we are getting lower performance from distributed cluster than
>> > > what we were getting from single-node pseudo distributed node. The
>> > > upload is taking about 30  minutes on an individual machine,
>> whereas
>> > > it is taking 2 hrs on the cluster. We have replication set to 3, so
>> > > all parts should ideally be available on all nodes, so we doubt if
>> the
>> > > problem is network latency. scp of files between nodes gives a
>> speed
>> > > of about 12 MB/s, which I believe should be good enough for this to
>> > > function. Please correct me if I am wrong here. The nodes are all 4
>> > > core machines with 8 GB RAM.  We are spawning 4 simultaneous map
>> tasks
>> > > on each node, and the job does not have any reduce phase. Any help
>> is
>> > > greatly appreciated.
>> > >
>> > > Thanks,
>> > > Hari Shankar
>> >
>>
>

Re: HBase not scaling well

Reply via email to