Actually we did try running off two machines both running our own tests in parallel. Unfortunately the results were a split that results in the same total throughput. We also did the same thing with iperf running from each machine to another machine, indicating 800Mb additional throughput between each pair of machines. However we didn't try these tests very thoroughly so I will revisit them as soon as I get back to the office, thanks.
On Mon, Mar 19, 2012 at 9:21 PM, Christian Schäfer <syrious3...@yahoo.de> wrote: > referring to my experiences I expect the client to be the bottleneck, too. > > So try to increase the count of client-machines (not client threads) each > with its own unshared network interface. > > In my case I could double write throughput by doubling client machine count > with a much smaller system than yours (5 machines, 4gigs RAM each). > > Good Luck > Chris > > > > ________________________________ > Von: Juhani Connolly <juha...@gmail.com> > An: user@hbase.apache.org > Gesendet: 13:02 Montag, 19.März 2012 > Betreff: Re: 0.92 and Read/writes not scaling > > I was concerned that may be the case too, which is why we ran the ycsb > tests in addition to our application specific and general performance > tests. checking profiles of the execution just showed the vast majority of > time spent waiting for responses. these were all run with 400 > threads(though we tried more/less just in case) > 2012/03/19 20:57 "Mingjian Deng" <koven2...@gmail.com>: > >> @Juhani: >> How many clients did you test? Maybe the bottleneck was client? >> >> 2012/3/19 Ramkrishna.S.Vasudevan <ramkrishna.vasude...@huawei.com> >> >> > Hi Juhani >> > >> > Can you tell more on how the regions are balanced? >> > Are you overloading only specific region server alone? >> > >> > Regards >> > Ram >> > >> > > -----Original Message----- >> > > From: Juhani Connolly [mailto:juha...@gmail.com] >> > > Sent: Monday, March 19, 2012 4:11 PM >> > > To: user@hbase.apache.org >> > > Subject: 0.92 and Read/writes not scaling >> > > >> > > Hi, >> > > >> > > We're running into a brick wall where our throughput numbers will not >> > > scale as we increase server counts both using custom inhouse tests and >> > > ycsb. >> > > >> > > We're using hbase 0.92 on hadoop 0.20.2(we also experience the same >> > > issues using 0.90 before switching our testing to this version). >> > > >> > > Our cluster consists of: >> > > - Namenode and hmaster on separate servers, 24 core, 64gb >> > > - up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb disks(hope >> > > to get this changed) >> > > >> > > We have adjusted our gc settings, and mslabs: >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.mslab.enabled</name> >> > > <value>true</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.mslab.chunksize</name> >> > > <value>2097152</value> >> > > </property> >> > > >> > > <property> >> > > <name>hbase.hregion.memstore.mslab.max.allocation</name> >> > > <value>1024768</value> >> > > </property> >> > > >> > > hdfs xceivers is set to 8192 >> > > >> > > We've experimented with a variety of handler counts for namenode, >> > > datanodes and regionservers with no changes in throughput. >> > > >> > > For testing with ycsb, we do the following each time(with nothing else >> > > using the cluster): >> > > - truncate test table >> > > - add a small amount of data, then split the table into 32 regions and >> > > call balancer from the shell. >> > > - load 10m rows >> > > - do a 1:2:7 insert:update:read test with 10million rows (64k/sec) >> > > - do a 5:5 insert:update test with 10 million rows (23k/sec) >> > > - do a pure read test with 10 million rows (75k/sec) >> > > >> > > We have observed ganglia, iostat -d -x, iptraf, top, dstat and a >> > > variety of other diagnostic tools and network/io/cpu/memory as >> > > bottlenecks seem highly unlikely as none of them are ever seriously >> > > taxed. This leave me to assume this is some kind of locking issue? >> > > Delaying WAL flushes gives a small throughput bump but it doesn't >> > > scale. >> > > >> > > There also doesn't seem to be many figures around to compare ours to. >> > > We can get our throughput numbers higher with tricks like not writing >> > > the WAL or delaying flushes, batching requests, but nothing seems to >> > > scale with additional slaves. >> > > Could anyone provide guidance as to what may be preventing throughput >> > > figures from scaling as we increase our slave count? >> > >> > >>