I'm doing a POC on HBase and wanted to see if someone could verify that my
map/scan performance is reasonable.  I have one 170 million row table.  

My cluster setup is 1 master node and 4 slave nodes, all w/ 8GB RM, 1 500GB
SATA disk, 1 quad core hyperthreaded CPU.

I'm running a MapReduce job over the whole table with only a map, no reduce. 
The scan for the map job is set to read in only 2 columns of just a few
bytes each.  I have each of the 4 slave nodes running 4 map tasks
simultaneously, so 16 map tasks at the same time.

This job completes in about 8 minutes.  That's 354K rows/second for the
cluster, 88K rows/second for the node, and 22K rows/second (or 22
rows/millisecond) for each map task.

Is this performance reasonable for this hardware or does it sound like I
need more tuning?  I've tried increasing the simultaneous map tasks, but I
hit both memory and disk I/O bottlenecks.
-- 
View this message in context: 
http://old.nabble.com/HBase-map-scan-performance-tp33176613p33176613.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to