Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Dave Benson Sat, 10 Jan 2015 16:05:58 -0800

Hi HBase users,

I'm working HBase for the first time and I'm trying to sort out a
performance issue. HBase is the data store for a small, focused web crawl
I'm performing with Apache Nutch. I'm running in pseudo-distributed mode,
meaning that Nutch, HBase and Hadoop are all on the same machine. The
machine's a few years old and has only 4 gigs of RAM - much smaller than
most HBase installs, I know.


When I first start my HBase processes I get about 60 seconds of fast
performance. Hbase reads quickly and uses a healthy portion CPU cycles.
After a minute or so, though, HBase slows dramatically. Reads sink to a
glacial pace, and the CPU sits mostly idle.

I notice this pattern when I run Nutch - particularly during read-heavy
operations - but also when I run a simple row counter from the shell.

At the moment " count 'my_table' " takes almost 4 hours to read through 500
000 rows. The reading is much faster at the start than the end.  In the
first 30 seconds, HBase counts 37000 rows, but in the 30 seconds between
8:00 and 8:30, only 1000 are counted.

Looking through my Ganglia report I see a brief return to high performance
around 3 hours into the count. I don't know what's causing this spike.


Can anyone suggest what configuration parameters I should change to improve
read performance?  Or what reference materials I should consult to better
understand the problem?  Again, I'm totally new to HBase.

I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5 Gigs.

Here's a Ganglia report covering the 4 hours of " count 'my_table' ":
http://imgur.com/Aa3eukZ

Please let me know if I can provide any more information.

Many thanks,


Dave

Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Reply via email to