Dave: As Michael suggests, you seem to be swapping going by your ganglia graph (the purple squiggles that often go above the 4G mark in the top right-hand memory graph). Swapping will put a stake in your throughput. Try lowering thresholds so you are not swapping. Stuff should run a little smoother. See http://hbase.apache.org/book.html#perf.os.swap
St.Ack On Sun, Jan 11, 2015 at 6:49 AM, Michael Segel <[email protected]> wrote: > @Ted, > Pseudo cluster on a machine that has 4GB of memory. > If you give HBase 1.5GB for the region server… you are left with 2.5 GB of > memory for everything else. > You will swap. > > In short, nothing he can do will help. He’s screwed if he is trying to > look improving performance. > > > On Jan 11, 2015, at 12:19 AM, Ted Yu <[email protected]> wrote: > > > Please see http://hbase.apache.org/book.html#perf.reading > > > > I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was way > > too old. > > > > bq. HBase has a heapsize of 1.5 Gigs > > > > This is not enough memory for good read performance. Please consider > giving > > HBase more heap. > > > > Cheers > > > > > > On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson <[email protected]> > wrote: > > > >> Hi HBase users, > >> > >> I'm working HBase for the first time and I'm trying to sort out a > >> performance issue. HBase is the data store for a small, focused web > crawl > >> I'm performing with Apache Nutch. I'm running in pseudo-distributed > mode, > >> meaning that Nutch, HBase and Hadoop are all on the same machine. The > >> machine's a few years old and has only 4 gigs of RAM - much smaller than > >> most HBase installs, I know. > >> > >> When I first start my HBase processes I get about 60 seconds of fast > >> performance. Hbase reads quickly and uses a healthy portion CPU cycles. > >> After a minute or so, though, HBase slows dramatically. Reads sink to a > >> glacial pace, and the CPU sits mostly idle. > >> > >> I notice this pattern when I run Nutch - particularly during read-heavy > >> operations - but also when I run a simple row counter from the shell. > >> > >> At the moment " count 'my_table' " takes almost 4 hours to read through > 500 > >> 000 rows. The reading is much faster at the start than the end. In the > >> first 30 seconds, HBase counts 37000 rows, but in the 30 seconds between > >> 8:00 and 8:30, only 1000 are counted. > >> > >> Looking through my Ganglia report I see a brief return to high > performance > >> around 3 hours into the count. I don't know what's causing this spike. > >> > >> > >> Can anyone suggest what configuration parameters I should change to > improve > >> read performance? Or what reference materials I should consult to > better > >> understand the problem? Again, I'm totally new to HBase. > >> > >> I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5 > Gigs. > >> > >> Here's a Ganglia report covering the 4 hours of " count 'my_table' ": > >> http://imgur.com/Aa3eukZ > >> > >> Please let me know if I can provide any more information. > >> > >> Many thanks, > >> > >> > >> Dave > >> > >
