Going from pseudo-distributed mode to a 3 node setup is definitely not "scaling" in a real way and I would expect performance degradation. Most especially when you're also running at replication factor 3 and in a setup where the master node is also acting as a slave node and MR task node.
You're adding an entirely new layer (HDFS) which will always cause increased latency/decreased throughput, and then you're running on 3 nodes with a replication factor of 3. So now every write is going to all three nodes, via HDFS, rather than a single node straight to the FS. You said that "all parts should ideally be available on all nodes", but this is a write test? So that's a bad thing not a good thing. I would expect about a 50% slowdown but you're seeing more like 75% slowdown. Not so out of the ordinary still. Stuffing a NN, DN, JT, TT, HMaster, and RS onto a single node is not a great idea. And then you're running 4 simultaneous tasks on a 4 core machine (along with these 6 other processes in the case of the master node). How many disks do each of your nodes have? If you really want to "scale" HBase, you're going to need more nodes. I've seen some success at a 5 node level but generally 10 nodes and up is when HBase does well (and replication 3 makes sense). JG > -----Original Message----- > From: Michael Segel [mailto:[email protected]] > Sent: Friday, October 29, 2010 8:03 AM > To: [email protected] > Subject: RE: HBase not scaling well > > > > I'd actually take a step back and ask what Hari is trying to do? > > Its difficult to figure out what the problem is when the OP says I've > got code that works on individual psuedo mode, but not in an actual > cluster. > It would be nice to know version(s), configuration... 3 nodes... are > they running ZK on the same machines that they are running Region > Servers... Are they swapping? 8GB of memory can disappear quickly... > > Lots of questions... > > > > From: [email protected] > > To: [email protected] > > Date: Fri, 29 Oct 2010 09:05:28 +0100 > > Subject: Re: HBase not scaling well > > > > Hi Hari, > > > > Could you do some realtime monitoring (htop, iptraf, iostat) and > report the results? Also you could add some timers to the map-reduce > operations: measure average operations times to figure out what's > taking so long. > > > > Cosmin > > On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote: > > > > > Hi, > > > > > > We are currently doing a POC for HBase in our system. We have > > > written a bulk upload job to upload our data from a text file into > > > HBase. We are using a 3-node cluster, one master which also works > as > > > slave (running as namenode, jobtracker, HMaster, datanode, > > > tasktracker, HQuorumpeer and HRegionServer) and 2 slaves > (datanode, > > > tasktracker, HQuorumpeer and HRegionServer running). The problem > is > > > that we are getting lower performance from distributed cluster than > > > what we were getting from single-node pseudo distributed node. The > > > upload is taking about 30 minutes on an individual machine, > whereas > > > it is taking 2 hrs on the cluster. We have replication set to 3, so > > > all parts should ideally be available on all nodes, so we doubt if > the > > > problem is network latency. scp of files between nodes gives a > speed > > > of about 12 MB/s, which I believe should be good enough for this to > > > function. Please correct me if I am wrong here. The nodes are all 4 > > > core machines with 8 GB RAM. We are spawning 4 simultaneous map > tasks > > > on each node, and the job does not have any reduce phase. Any help > is > > > greatly appreciated. > > > > > > Thanks, > > > Hari Shankar > > >
