Hello everyone!
We started using hbase (hadoop) system and faced some performance issues. 
Actually we are using hbase in pseudo distributed mode on one node.
We used Cloudera distribution pack of Hadoop on operating system CentOs 6 with 
default configuration according to 
https://ccp.cloudera.com/display/CDHDOC/HBase+Installation.

So, we started to test them on random reading.
Test data contains one table. Each row has length about 10 Kb. Total data size 
is 400 000 rows (or about 3,19 Gb).
Average random reading rate from one Thrift/Java API connection is 30 rows per 
second, writing --- 250 rows per second.
If we use 4 connections, random reading rate increases to 120 rows per second 
on each connection, or total 480 rows per second. 
So, increasing connections involves increasing of random reading performance on 
each connection.
However, standard tests from 
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation showed performance 
2000 rows per second for random reading.
Also we noticed, that overall node resources (io, cpu) are being used no more 
than 3%. We have enough RAM (8G and 2 of them is free).

Is there any rational explanation of this issue?

Best regards, Dmitry Kangin.

Reply via email to