Hello everyone! We started using hbase (hadoop) system and faced some performance issues. Actually we are using hbase in pseudo distributed mode on one node. We used Cloudera distribution pack of Hadoop on operating system CentOs 6 with default configuration according to https://ccp.cloudera.com/display/CDHDOC/HBase+Installation.
So, we started to test them on random reading. Test data contains one table. Each row has length about 10 Kb. Total data size is 400 000 rows (or about 3,19 Gb). Average random reading rate from one Thrift/Java API connection is 30 rows per second, writing --- 250 rows per second. If we use 4 connections, random reading rate increases to 120 rows per second on each connection, or total 480 rows per second. So, increasing connections involves increasing of random reading performance on each connection. However, standard tests from http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation showed performance 2000 rows per second for random reading. Also we noticed, that overall node resources (io, cpu) are being used no more than 3%. We have enough RAM (8G and 2 of them is free). Is there any rational explanation of this issue? Best regards, Dmitry Kangin.
