Thanks guys for the replies, and very sorry for the late reply. We are
quite new to linux environment... our production servers are currently
running on windows and our linux sysadmin is yet to arrive. So please
forgive my ignorance regarding linux tools. Very little prior
experience in linux. All our 3 nodes are running on different linux
distros - one on ubuntu server 10.10, one on CentOS and one on
Ubuntu-desktop 10.04. All have the same directory structure and same
versions of hadoop, hbase and java though. Let me know if you think
this could be an issue. Basically we wanted to evaluate all three
distros at the same time as well. I hope than plan didn't backfire.
Back to the problem at hand, here are the iptraf, htop and iostat reports:
iptraf snapshot --master
Total rates:
165424.2 kbits/s
3800 packets/s
Incoming:
109415.3 kbits/s
3007.4 packets/s
iptraf snapshot --slave01
Total rates:
102024 kbits/s
3128 packets/s
Incoming:
48755.9 kbits/s
1784 packets/s
iostat --master
Linux 2.6.32-21-generic (hadoop1) Sunday 31 October 2010 _x86_64_
(4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.54 0.01 0.18 0.30 0.00 98.97
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.43 123.11 412.93 33988462 114000368
iostat --slave01
Linux 2.6.35-22-server (hadoop2) Sunday 31 October 2010 _x86_64_
(4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.77 0.00 0.29 0.18 0.00 98.77
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.90 277.19 1228.97 245515598 1088525808
iostat --slave02
Linux 2.6.18-194.11.1.el5 (slave) 10/31/2010
avg-cpu: %user %nice %system %iowait %steal %idle
0.54 0.00 0.29 0.80 0.00 98.37
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 6.57 302.09 1652.06 321914364 1760497088
sda1 0.00 0.00 0.00 2458 88
sda2 6.57 302.08 1652.06 321911602 1760497000
dm-0 209.33 302.08 1652.06 321910322 1760496272
dm-1 <F5> 0.00 0.00 0.00 896 728
htop --master
http://imgur.com/3zTu7
htop --slave01
http://imgur.com/5HeyF
htop --slave02
http://imgur.com/lHin7
I hope these are the reports that you were referring to. Please let me
otherwise. Also, is there an easier command-line way of fetching the
iptraf and htop reports? master is running ubuntu desktop, slave01
runs ubuntu server and slave02 runs CentOS.
Some more facts that I have noticed:
- I ran the job just now on the cluster after reformatting the
namenode and it took only 1 hr 15 mins instead of the usual 2 hrs,
though still slower than the single node config (30-40 mins). Can it
be that it is faster right after a namenode format?
- The time set on one of the slaves was incorrect and it lagged by 4
hrs compared to the other two machines. I corrected the time before
formatting the namenode this time. I wonder if that could have an
impact.
- I have ZK running on all 3 machines. Shouldn't it work fine if I
just set up ZK on one of the nodes. In that case, I get a weird error:
could not connect to port 0::0::0::0::....:2181 or something of that
sort. I'll post the full error next time I see it.
- The CentOS machine (slave02) seems to use a lot more CPU than the
other 2 guys on an average. CPU usage in centos hovers around 50-60%
mostly whereas it is more like 30-40% on the other 2 machines. (ref.
htop screenshots above).
- One a single-node configuration, moving from a 4 GB-RAM dual core
laptop to an 8 GB-quad core machine gives a 1.8x performance
improvement.
- Increasing the child task heap size from the default 200 MB to 768
MB improved performance on both single and multi node clusters by 100%
(2x improvement). But going beyond 768 MB doesn't seem to have much
impact.
Michael and Jonathan, I think I have covered most of the info you guys
had asked for as well above. It doesn't seem to be swapping, and yes,
currently we are running all thoise processes on the master, and all
processes minus namenode, secondary namenode and JT on the slaves. But
we run all those processes on a single machine in case of single node
as well, right? So if RAM/Swap was the culprit, shouldn't it effect
single-node config more?
Do let me know if anything is missing or you think more info would
help. Many thanks for your time and patience. :)
Thanks,
Hari
On Fri, Oct 29, 2010 at 9:51 PM, Jonathan Gray <[email protected]> wrote:
> Going from pseudo-distributed mode to a 3 node setup is definitely not
> "scaling" in a real way and I would expect performance degradation. Most
> especially when you're also running at replication factor 3 and in a setup
> where the master node is also acting as a slave node and MR task node.
>
> You're adding an entirely new layer (HDFS) which will always cause increased
> latency/decreased throughput, and then you're running on 3 nodes with a
> replication factor of 3. So now every write is going to all three nodes, via
> HDFS, rather than a single node straight to the FS.
>
> You said that "all parts should ideally be available on all nodes", but this
> is a write test? So that's a bad thing not a good thing.
>
> I would expect about a 50% slowdown but you're seeing more like 75% slowdown.
> Not so out of the ordinary still. Stuffing a NN, DN, JT, TT, HMaster, and
> RS onto a single node is not a great idea. And then you're running 4
> simultaneous tasks on a 4 core machine (along with these 6 other processes in
> the case of the master node).
>
> How many disks do each of your nodes have?
>
> If you really want to "scale" HBase, you're going to need more nodes. I've
> seen some success at a 5 node level but generally 10 nodes and up is when
> HBase does well (and replication 3 makes sense).
>
> JG
>
>
>> -----Original Message-----
>> From: Michael Segel [mailto:[email protected]]
>> Sent: Friday, October 29, 2010 8:03 AM
>> To: [email protected]
>> Subject: RE: HBase not scaling well
>>
>>
>>
>> I'd actually take a step back and ask what Hari is trying to do?
>>
>> Its difficult to figure out what the problem is when the OP says I've
>> got code that works on individual psuedo mode, but not in an actual
>> cluster.
>> It would be nice to know version(s), configuration... 3 nodes... are
>> they running ZK on the same machines that they are running Region
>> Servers... Are they swapping? 8GB of memory can disappear quickly...
>>
>> Lots of questions...
>>
>>
>> > From: [email protected]
>> > To: [email protected]
>> > Date: Fri, 29 Oct 2010 09:05:28 +0100
>> > Subject: Re: HBase not scaling well
>> >
>> > Hi Hari,
>> >
>> > Could you do some realtime monitoring (htop, iptraf, iostat) and
>> report the results? Also you could add some timers to the map-reduce
>> operations: measure average operations times to figure out what's
>> taking so long.
>> >
>> > Cosmin
>> > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote:
>> >
>> > > Hi,
>> > >
>> > > We are currently doing a POC for HBase in our system. We have
>> > > written a bulk upload job to upload our data from a text file into
>> > > HBase. We are using a 3-node cluster, one master which also works
>> as
>> > > slave (running as namenode, jobtracker, HMaster, datanode,
>> > > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves
>> (datanode,
>> > > tasktracker, HQuorumpeer and HRegionServer running). The problem
>> is
>> > > that we are getting lower performance from distributed cluster than
>> > > what we were getting from single-node pseudo distributed node. The
>> > > upload is taking about 30 minutes on an individual machine,
>> whereas
>> > > it is taking 2 hrs on the cluster. We have replication set to 3, so
>> > > all parts should ideally be available on all nodes, so we doubt if
>> the
>> > > problem is network latency. scp of files between nodes gives a
>> speed
>> > > of about 12 MB/s, which I believe should be good enough for this to
>> > > function. Please correct me if I am wrong here. The nodes are all 4
>> > > core machines with 8 GB RAM. We are spawning 4 simultaneous map
>> tasks
>> > > on each node, and the job does not have any reduce phase. Any help
>> is
>> > > greatly appreciated.
>> > >
>> > > Thanks,
>> > > Hari Shankar
>> >
>>
>