I'm doing a POC on HBase and wanted to see if someone could verify that my map/scan performance is reasonable. I have one 170 million row table.
My cluster setup is 1 master node and 4 slave nodes, all w/ 8GB RM, 1 500GB SATA disk, 1 quad core hyperthreaded CPU. I'm running a MapReduce job over the whole table with only a map, no reduce. The scan for the map job is set to read in only 2 columns of just a few bytes each. I have each of the 4 slave nodes running 4 map tasks simultaneously, so 16 map tasks at the same time. This job completes in about 8 minutes. That's 354K rows/second for the cluster, 88K rows/second for the node, and 22K rows/second (or 22 rows/millisecond) for each map task. Is this performance reasonable for this hardware or does it sound like I need more tuning? I've tried increasing the simultaneous map tasks, but I hit both memory and disk I/O bottlenecks. -- View this message in context: http://old.nabble.com/HBase-map-scan-performance-tp33176613p33176613.html Sent from the HBase User mailing list archive at Nabble.com.
