Hi all,
I ran the TestDFSIO job on my cluster and thought I'd append it
here in case it is of any help:
10/11/02 17:53:56 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : write
10/11/02 17:53:56 INFO mapred.FileInputFormat: Date & time:
Tue Nov 02 17:53:56 IST 2010
10/11/02 17:53:56 INFO mapred.FileInputFormat: Number of files: 10
10/11/02 17:53:56 INFO mapred.FileInputFormat: Total MBytes processed: 10000
10/11/02 17:53:56 INFO mapred.FileInputFormat: Throughput mb/sec:
1.2372449326777915
10/11/02 17:53:56 INFO mapred.FileInputFormat: Average IO rate mb/sec:
1.2381720542907715
10/11/02 17:53:56 INFO mapred.FileInputFormat: IO rate std deviation:
0.03402313342081011
10/11/02 17:53:56 INFO mapred.FileInputFormat: Test exec time sec: 866.931
10/11/02 17:53:56 INFO mapred.FileInputFormat:
10/11/02 17:59:35 INFO mapred.FileInputFormat: ----- TestDFSIO ----- : read
10/11/02 17:59:35 INFO mapred.FileInputFormat: Date & time:
Tue Nov 02 17:59:35 IST 2010
10/11/02 17:59:35 INFO mapred.FileInputFormat: Number of files: 10
10/11/02 17:59:35 INFO mapred.FileInputFormat: Total MBytes processed: 10000
10/11/02 17:59:35 INFO mapred.FileInputFormat: Throughput mb/sec:
22.776708537849196
10/11/02 17:59:35 INFO mapred.FileInputFormat: Average IO rate mb/sec:
28.383480072021484
10/11/02 17:59:35 INFO mapred.FileInputFormat: IO rate std deviation:
12.521607590777203
10/11/02 17:59:35 INFO mapred.FileInputFormat: Test exec time sec: 108.735
10/11/02 17:59:35 INFO mapred.FileInputFormat:
For a 3 node cluster, is this good/bad/ugly..? Where can I find data
to compare my cluster regarding such parameters?
Thanks,
Hari
On Sun, Oct 31, 2010 at 11:59 PM, Hari Shankar <[email protected]> wrote:
> Thanks guys for the replies, and very sorry for the late reply. We are
> quite new to linux environment... our production servers are currently
> running on windows and our linux sysadmin is yet to arrive. So please
> forgive my ignorance regarding linux tools. Very little prior
> experience in linux. All our 3 nodes are running on different linux
> distros - one on ubuntu server 10.10, one on CentOS and one on
> Ubuntu-desktop 10.04. All have the same directory structure and same
> versions of hadoop, hbase and java though. Let me know if you think
> this could be an issue. Basically we wanted to evaluate all three
> distros at the same time as well. I hope than plan didn't backfire.
>
> Back to the problem at hand, here are the iptraf, htop and iostat reports:
>
> iptraf snapshot --master
>
> Total rates:
> 165424.2 kbits/s
> 3800 packets/s
>
> Incoming:
> 109415.3 kbits/s
> 3007.4 packets/s
>
> iptraf snapshot --slave01
>
> Total rates:
> 102024 kbits/s
> 3128 packets/s
>
> Incoming:
> 48755.9 kbits/s
> 1784 packets/s
>
> iostat --master
>
> Linux 2.6.32-21-generic (hadoop1) Sunday 31 October 2010 _x86_64_
> (4 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.54 0.01 0.18 0.30 0.00 98.97
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 2.43 123.11 412.93 33988462 114000368
>
> iostat --slave01
>
> Linux 2.6.35-22-server (hadoop2) Sunday 31 October 2010 _x86_64_
> (4 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.77 0.00 0.29 0.18 0.00 98.77
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 3.90 277.19 1228.97 245515598 1088525808
>
>
> iostat --slave02
>
> Linux 2.6.18-194.11.1.el5 (slave) 10/31/2010
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.54 0.00 0.29 0.80 0.00 98.37
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 6.57 302.09 1652.06 321914364 1760497088
> sda1 0.00 0.00 0.00 2458 88
> sda2 6.57 302.08 1652.06 321911602 1760497000
> dm-0 209.33 302.08 1652.06 321910322 1760496272
> dm-1 <F5> 0.00 0.00 0.00 896 728
>
> htop --master
> http://imgur.com/3zTu7
>
> htop --slave01
> http://imgur.com/5HeyF
>
> htop --slave02
> http://imgur.com/lHin7
>
> I hope these are the reports that you were referring to. Please let me
> otherwise. Also, is there an easier command-line way of fetching the
> iptraf and htop reports? master is running ubuntu desktop, slave01
> runs ubuntu server and slave02 runs CentOS.
> Some more facts that I have noticed:
> - I ran the job just now on the cluster after reformatting the
> namenode and it took only 1 hr 15 mins instead of the usual 2 hrs,
> though still slower than the single node config (30-40 mins). Can it
> be that it is faster right after a namenode format?
> - The time set on one of the slaves was incorrect and it lagged by 4
> hrs compared to the other two machines. I corrected the time before
> formatting the namenode this time. I wonder if that could have an
> impact.
> - I have ZK running on all 3 machines. Shouldn't it work fine if I
> just set up ZK on one of the nodes. In that case, I get a weird error:
> could not connect to port 0::0::0::0::....:2181 or something of that
> sort. I'll post the full error next time I see it.
> - The CentOS machine (slave02) seems to use a lot more CPU than the
> other 2 guys on an average. CPU usage in centos hovers around 50-60%
> mostly whereas it is more like 30-40% on the other 2 machines. (ref.
> htop screenshots above).
> - One a single-node configuration, moving from a 4 GB-RAM dual core
> laptop to an 8 GB-quad core machine gives a 1.8x performance
> improvement.
> - Increasing the child task heap size from the default 200 MB to 768
> MB improved performance on both single and multi node clusters by 100%
> (2x improvement). But going beyond 768 MB doesn't seem to have much
> impact.
>
> Michael and Jonathan, I think I have covered most of the info you guys
> had asked for as well above. It doesn't seem to be swapping, and yes,
> currently we are running all thoise processes on the master, and all
> processes minus namenode, secondary namenode and JT on the slaves. But
> we run all those processes on a single machine in case of single node
> as well, right? So if RAM/Swap was the culprit, shouldn't it effect
> single-node config more?
>
> Do let me know if anything is missing or you think more info would
> help. Many thanks for your time and patience. :)
>
> Thanks,
> Hari
>
> On Fri, Oct 29, 2010 at 9:51 PM, Jonathan Gray <[email protected]> wrote:
>> Going from pseudo-distributed mode to a 3 node setup is definitely not
>> "scaling" in a real way and I would expect performance degradation. Most
>> especially when you're also running at replication factor 3 and in a setup
>> where the master node is also acting as a slave node and MR task node.
>>
>> You're adding an entirely new layer (HDFS) which will always cause increased
>> latency/decreased throughput, and then you're running on 3 nodes with a
>> replication factor of 3. So now every write is going to all three nodes,
>> via HDFS, rather than a single node straight to the FS.
>>
>> You said that "all parts should ideally be available on all nodes", but this
>> is a write test? So that's a bad thing not a good thing.
>>
>> I would expect about a 50% slowdown but you're seeing more like 75%
>> slowdown. Not so out of the ordinary still. Stuffing a NN, DN, JT, TT,
>> HMaster, and RS onto a single node is not a great idea. And then you're
>> running 4 simultaneous tasks on a 4 core machine (along with these 6 other
>> processes in the case of the master node).
>>
>> How many disks do each of your nodes have?
>>
>> If you really want to "scale" HBase, you're going to need more nodes. I've
>> seen some success at a 5 node level but generally 10 nodes and up is when
>> HBase does well (and replication 3 makes sense).
>>
>> JG
>>
>>
>>> -----Original Message-----
>>> From: Michael Segel [mailto:[email protected]]
>>> Sent: Friday, October 29, 2010 8:03 AM
>>> To: [email protected]
>>> Subject: RE: HBase not scaling well
>>>
>>>
>>>
>>> I'd actually take a step back and ask what Hari is trying to do?
>>>
>>> Its difficult to figure out what the problem is when the OP says I've
>>> got code that works on individual psuedo mode, but not in an actual
>>> cluster.
>>> It would be nice to know version(s), configuration... 3 nodes... are
>>> they running ZK on the same machines that they are running Region
>>> Servers... Are they swapping? 8GB of memory can disappear quickly...
>>>
>>> Lots of questions...
>>>
>>>
>>> > From: [email protected]
>>> > To: [email protected]
>>> > Date: Fri, 29 Oct 2010 09:05:28 +0100
>>> > Subject: Re: HBase not scaling well
>>> >
>>> > Hi Hari,
>>> >
>>> > Could you do some realtime monitoring (htop, iptraf, iostat) and
>>> report the results? Also you could add some timers to the map-reduce
>>> operations: measure average operations times to figure out what's
>>> taking so long.
>>> >
>>> > Cosmin
>>> > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > We are currently doing a POC for HBase in our system. We have
>>> > > written a bulk upload job to upload our data from a text file into
>>> > > HBase. We are using a 3-node cluster, one master which also works
>>> as
>>> > > slave (running as namenode, jobtracker, HMaster, datanode,
>>> > > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves
>>> (datanode,
>>> > > tasktracker, HQuorumpeer and HRegionServer running). The problem
>>> is
>>> > > that we are getting lower performance from distributed cluster than
>>> > > what we were getting from single-node pseudo distributed node. The
>>> > > upload is taking about 30 minutes on an individual machine,
>>> whereas
>>> > > it is taking 2 hrs on the cluster. We have replication set to 3, so
>>> > > all parts should ideally be available on all nodes, so we doubt if
>>> the
>>> > > problem is network latency. scp of files between nodes gives a
>>> speed
>>> > > of about 12 MB/s, which I believe should be good enough for this to
>>> > > function. Please correct me if I am wrong here. The nodes are all 4
>>> > > core machines with 8 GB RAM. We are spawning 4 simultaneous map
>>> tasks
>>> > > on each node, and the job does not have any reduce phase. Any help
>>> is
>>> > > greatly appreciated.
>>> > >
>>> > > Thanks,
>>> > > Hari Shankar
>>> >
>>>
>>
>