The primary unit of load distribution in HBase is the region, make sure you have more than one. This is well documented in the manual http://hbase.apache.org/book/perf.writing.html
J-D On Fri, Mar 1, 2013 at 4:17 AM, Dan Crosta <[email protected]> wrote: > We are using a 6-node HBase cluster with a Thrift Server on each of the > RegionServer nodes, and trying to evaluate maximum write throughput for our > use case (which involves many processes sending mutateRowsTs commands). > Somewhere between about 30 and 40 processes writing into the system we cross > the threshold where adding additional writers yields only very limited > returns to throughput, and I'm not sure why. We see that the CPU and Disk on > the DataNode/RegionServer/ThriftServer machines are not saturated, nor is the > NIC in those machines. I'm a little unsure where to look next. > > A little more detail about our deployment: > > * CDH 4.1.2 > * DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge > ** RegionServer: 8GB heap > ** ThriftServer: 1GB heap > ** DataNode: 4GB heap > ** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS > > If there's any other information that I can provide, or any other > configuration or system settings I should look at, I'd appreciate the > pointers. > > Thanks, > - Dan
