Hi In our experience rather than increasing threads increase the number of clients. Increasing the client number has given us better throughput.
Regards Ram > -----Original Message----- > From: Juhani Connolly [mailto:[email protected]] > Sent: Monday, March 19, 2012 5:33 PM > To: [email protected] > Subject: Re: 0.92 and Read/writes not scaling > > I was concerned that may be the case too, which is why we ran the ycsb > tests in addition to our application specific and general performance > tests. checking profiles of the execution just showed the vast majority > of > time spent waiting for responses. these were all run with 400 > threads(though we tried more/less just in case) > 2012/03/19 20:57 "Mingjian Deng" <[email protected]>: > > > @Juhani: > > How many clients did you test? Maybe the bottleneck was client? > > > > 2012/3/19 Ramkrishna.S.Vasudevan <[email protected]> > > > > > Hi Juhani > > > > > > Can you tell more on how the regions are balanced? > > > Are you overloading only specific region server alone? > > > > > > Regards > > > Ram > > > > > > > -----Original Message----- > > > > From: Juhani Connolly [mailto:[email protected]] > > > > Sent: Monday, March 19, 2012 4:11 PM > > > > To: [email protected] > > > > Subject: 0.92 and Read/writes not scaling > > > > > > > > Hi, > > > > > > > > We're running into a brick wall where our throughput numbers will > not > > > > scale as we increase server counts both using custom inhouse > tests and > > > > ycsb. > > > > > > > > We're using hbase 0.92 on hadoop 0.20.2(we also experience the > same > > > > issues using 0.90 before switching our testing to this version). > > > > > > > > Our cluster consists of: > > > > - Namenode and hmaster on separate servers, 24 core, 64gb > > > > - up to 11 datanode/regionservers. 24 core, 64gb, 4 * 1tb > disks(hope > > > > to get this changed) > > > > > > > > We have adjusted our gc settings, and mslabs: > > > > > > > > <property> > > > > <name>hbase.hregion.memstore.mslab.enabled</name> > > > > <value>true</value> > > > > </property> > > > > > > > > <property> > > > > <name>hbase.hregion.memstore.mslab.chunksize</name> > > > > <value>2097152</value> > > > > </property> > > > > > > > > <property> > > > > <name>hbase.hregion.memstore.mslab.max.allocation</name> > > > > <value>1024768</value> > > > > </property> > > > > > > > > hdfs xceivers is set to 8192 > > > > > > > > We've experimented with a variety of handler counts for namenode, > > > > datanodes and regionservers with no changes in throughput. > > > > > > > > For testing with ycsb, we do the following each time(with nothing > else > > > > using the cluster): > > > > - truncate test table > > > > - add a small amount of data, then split the table into 32 > regions and > > > > call balancer from the shell. > > > > - load 10m rows > > > > - do a 1:2:7 insert:update:read test with 10million rows > (64k/sec) > > > > - do a 5:5 insert:update test with 10 million rows (23k/sec) > > > > - do a pure read test with 10 million rows (75k/sec) > > > > > > > > We have observed ganglia, iostat -d -x, iptraf, top, dstat and a > > > > variety of other diagnostic tools and network/io/cpu/memory as > > > > bottlenecks seem highly unlikely as none of them are ever > seriously > > > > taxed. This leave me to assume this is some kind of locking > issue? > > > > Delaying WAL flushes gives a small throughput bump but it doesn't > > > > scale. > > > > > > > > There also doesn't seem to be many figures around to compare ours > to. > > > > We can get our throughput numbers higher with tricks like not > writing > > > > the WAL or delaying flushes, batching requests, but nothing seems > to > > > > scale with additional slaves. > > > > Could anyone provide guidance as to what may be preventing > throughput > > > > figures from scaling as we increase our slave count? > > > > > > > >
