Re: HBase not scaling well

Hari Shankar Tue, 02 Nov 2010 05:31:32 -0700

Hi all,

      I ran the TestDFSIO job on my cluster and thought I'd append it
here in case it is of any help:


10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write
10/11/02 17:53:56 INFO mapred.FileInputFormat:            Date & time:
Tue Nov 02 17:53:56 IST 2010
10/11/02 17:53:56 INFO mapred.FileInputFormat:        Number of files: 10
10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000
10/11/02 17:53:56 INFO mapred.FileInputFormat:      Throughput mb/sec:
1.2372449326777915
10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec:
1.2381720542907715
10/11/02 17:53:56 INFO mapred.FileInputFormat:  IO rate std deviation:
0.03402313342081011
10/11/02 17:53:56 INFO mapred.FileInputFormat:     Test exec time sec: 866.931
10/11/02 17:53:56 INFO mapred.FileInputFormat:

10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read
10/11/02 17:59:35 INFO mapred.FileInputFormat:            Date & time:
Tue Nov 02 17:59:35 IST 2010
10/11/02 17:59:35 INFO mapred.FileInputFormat:        Number of files: 10
10/11/02 17:59:35 INFO mapred.FileInputFormat: Total MBytes processed: 10000
10/11/02 17:59:35 INFO mapred.FileInputFormat:      Throughput mb/sec:
22.776708537849196
10/11/02 17:59:35 INFO mapred.FileInputFormat: Average IO rate mb/sec:
28.383480072021484
10/11/02 17:59:35 INFO mapred.FileInputFormat:  IO rate std deviation:
12.521607590777203
10/11/02 17:59:35 INFO mapred.FileInputFormat:     Test exec time sec: 108.735
10/11/02 17:59:35 INFO mapred.FileInputFormat:

For a 3 node cluster, is this good/bad/ugly..? Where can I find data
to compare my cluster regarding such parameters?

Thanks,
Hari



On Sun, Oct 31, 2010 at 11:59 PM, Hari Shankar <[email protected]> wrote:
> Thanks guys for the replies, and very sorry for the late reply. We are
> quite new to linux environment... our production servers are currently
> running on windows and our linux sysadmin is yet to arrive. So please
> forgive my ignorance regarding linux tools. Very little prior
> experience in linux. All our 3 nodes are running on different linux
> distros - one on ubuntu server 10.10, one on CentOS and one on
> Ubuntu-desktop 10.04. All have the same directory structure and same
> versions of hadoop, hbase and java though. Let me know if you think
> this could be an issue. Basically we wanted to evaluate all three
> distros at the same time as well. I hope than plan didn't backfire.
>
> Back to the problem at hand, here are the iptraf, htop and iostat reports:
>
> iptraf snapshot --master
>
> Total rates:
> 165424.2 kbits/s
> 3800 packets/s
>
> Incoming:
> 109415.3 kbits/s
> 3007.4 packets/s
>
> iptraf snapshot --slave01
>
> Total rates:
> 102024 kbits/s
> 3128 packets/s
>
> Incoming:
> 48755.9 kbits/s
> 1784 packets/s
>
> iostat --master
>
> Linux 2.6.32-21-generic (hadoop1)       Sunday 31 October 2010  _x86_64_      
>   (4 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                    0.54    0.01      0.18     0.30         0.00   98.97
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               2.43       123.11       412.93   33988462  114000368
>
> iostat --slave01
>
> Linux 2.6.35-22-server (hadoop2)        Sunday 31 October 2010  _x86_64_      
>   (4 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   0.77    0.00    0.29    0.18    0.00   98.77
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               3.90       277.19      1228.97  245515598 1088525808
>
>
> iostat --slave02
>
> Linux 2.6.18-194.11.1.el5 (slave)       10/31/2010
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                  0.54    0.00             0.29    0.80    0.00   98.37
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               6.57       302.09      1652.06  321914364 1760497088
> sda1              0.00         0.00         0.00       2458         88
> sda2              6.57       302.08      1652.06  321911602 1760497000
> dm-0            209.33       302.08      1652.06  321910322 1760496272
> dm-1           <F5>   0.00         0.00         0.00        896        728
>
> htop --master
> http://imgur.com/3zTu7
>
> htop --slave01
> http://imgur.com/5HeyF
>
> htop --slave02
> http://imgur.com/lHin7
>
> I hope these are the reports that you were referring to. Please let me
> otherwise. Also, is there an easier command-line way of fetching the
> iptraf and htop reports? master is running ubuntu desktop, slave01
> runs ubuntu server and slave02 runs CentOS.
> Some more facts that I have noticed:
> - I ran the job just now on the cluster after reformatting the
> namenode and it took only 1 hr 15 mins instead of the usual 2 hrs,
> though still slower than the single node config (30-40 mins). Can it
> be that it is faster right after a namenode format?
> - The time set on one of the slaves was incorrect and it lagged by 4
> hrs compared to the other two machines. I corrected the time before
> formatting the namenode this time. I wonder if that could have an
> impact.
> - I have ZK running on all 3 machines. Shouldn't it work fine if I
> just set up ZK on one of the nodes. In that case, I get a weird error:
> could not connect to port 0::0::0::0::....:2181 or something of that
> sort. I'll post the full error next time I see it.
> - The CentOS machine (slave02) seems to use a lot more CPU than the
> other 2 guys on an average. CPU usage in centos hovers around 50-60%
> mostly whereas it is more like 30-40% on the other 2 machines. (ref.
> htop screenshots above).
> - One a single-node configuration, moving from a 4 GB-RAM dual core
> laptop to an 8 GB-quad core machine gives a 1.8x performance
> improvement.
> - Increasing the child task heap size from the default 200 MB to 768
> MB improved performance on both single and multi node clusters by 100%
> (2x improvement). But going beyond 768 MB doesn't seem to have much
> impact.
>
> Michael and Jonathan, I think I have covered most of the info you guys
> had asked for as well above. It doesn't seem to be swapping, and yes,
> currently we are running all thoise processes on the master, and all
> processes minus namenode, secondary namenode and JT on the slaves. But
> we run all those processes on a single machine in case of single node
> as well, right? So if RAM/Swap was the culprit, shouldn't it effect
> single-node config more?
>
> Do let me know if anything is missing or you think more info would
> help. Many thanks for your time and patience. :)
>
> Thanks,
> Hari
>
> On Fri, Oct 29, 2010 at 9:51 PM, Jonathan Gray <[email protected]> wrote:
>> Going from pseudo-distributed mode to a 3 node setup is definitely not 
>> "scaling" in a real way and I would expect performance degradation.  Most 
>> especially when you're also running at replication factor 3 and in a setup 
>> where the master node is also acting as a slave node and MR task node.
>>
>> You're adding an entirely new layer (HDFS) which will always cause increased 
>> latency/decreased throughput, and then you're running on 3 nodes with a 
>> replication factor of 3.  So now every write is going to all three nodes, 
>> via HDFS, rather than a single node straight to the FS.
>>
>> You said that "all parts should ideally be available on all nodes", but this 
>> is a write test?  So that's a bad thing not a good thing.
>>
>> I would expect about a 50% slowdown but you're seeing more like 75% 
>> slowdown.  Not so out of the ordinary still.  Stuffing a NN, DN, JT, TT, 
>> HMaster, and RS onto a single node is not a great idea.  And then you're 
>> running 4 simultaneous tasks on a 4 core machine (along with these 6 other 
>> processes in the case of the master node).
>>
>> How many disks do each of your nodes have?
>>
>> If you really want to "scale" HBase, you're going to need more nodes.  I've 
>> seen some success at a 5 node level but generally 10 nodes and up is when 
>> HBase does well (and replication 3 makes sense).
>>
>> JG
>>
>>
>>> -----Original Message-----
>>> From: Michael Segel [mailto:[email protected]]
>>> Sent: Friday, October 29, 2010 8:03 AM
>>> To: [email protected]
>>> Subject: RE: HBase not scaling well
>>>
>>>
>>>
>>> I'd actually take a step back and ask what Hari is trying to do?
>>>
>>> Its difficult to figure out what the problem is when the OP says I've
>>> got code that works on individual psuedo mode, but not in an actual
>>> cluster.
>>> It would be nice to know version(s), configuration... 3 nodes... are
>>> they running ZK on the same machines that they are running Region
>>> Servers... Are they swapping? 8GB of memory can disappear quickly...
>>>
>>> Lots of questions...
>>>
>>>
>>> > From: [email protected]
>>> > To: [email protected]
>>> > Date: Fri, 29 Oct 2010 09:05:28 +0100
>>> > Subject: Re: HBase not scaling well
>>> >
>>> > Hi Hari,
>>> >
>>> > Could you do some realtime monitoring (htop, iptraf, iostat) and
>>> report the results? Also you could add some timers to the map-reduce
>>> operations: measure average operations times to figure out what's
>>> taking so long.
>>> >
>>> > Cosmin
>>> > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > >     We are currently doing a POC for HBase in our system. We have
>>> > > written a bulk upload job to upload our data from a text file into
>>> > > HBase. We are using a 3-node cluster, one master which also works
>>> as
>>> > > slave (running as namenode, jobtracker, HMaster, datanode,
>>> > > tasktracker, HQuorumpeer and  HRegionServer) and 2 slaves
>>> (datanode,
>>> > > tasktracker, HQuorumpeer and  HRegionServer running). The problem
>>> is
>>> > > that we are getting lower performance from distributed cluster than
>>> > > what we were getting from single-node pseudo distributed node. The
>>> > > upload is taking about 30  minutes on an individual machine,
>>> whereas
>>> > > it is taking 2 hrs on the cluster. We have replication set to 3, so
>>> > > all parts should ideally be available on all nodes, so we doubt if
>>> the
>>> > > problem is network latency. scp of files between nodes gives a
>>> speed
>>> > > of about 12 MB/s, which I believe should be good enough for this to
>>> > > function. Please correct me if I am wrong here. The nodes are all 4
>>> > > core machines with 8 GB RAM.  We are spawning 4 simultaneous map
>>> tasks
>>> > > on each node, and the job does not have any reduce phase. Any help
>>> is
>>> > > greatly appreciated.
>>> > >
>>> > > Thanks,
>>> > > Hari Shankar
>>> >
>>>
>>
>

Re: HBase not scaling well

Reply via email to