St.Ack, Michael and Ted - thanks for your responses.
On Sun, Jan 11, 2015 at 1:38 PM, Stack <[email protected]> wrote: > Dave: > > As Michael suggests, you seem to be swapping going by your ganglia graph > (the purple squiggles that often go above the 4G mark in the top right-hand > memory graph). Swapping will put a stake in your throughput. Try lowering > thresholds so you are not swapping. Stuff should run a little smoother. > See http://hbase.apache.org/book.html#perf.os.swap St.Ack - I hope you can forgive the question, but which "thresholds" should I be lowering, exactly? Do you mean that I should decrease the HBase heapsize until RAM usage stays below 4G? Thanks, Dave > > > St.Ack > > On Sun, Jan 11, 2015 at 6:49 AM, Michael Segel <[email protected]> > wrote: > > > @Ted, > > Pseudo cluster on a machine that has 4GB of memory. > > If you give HBase 1.5GB for the region server… you are left with 2.5 GB > of > > memory for everything else. > > You will swap. > > > > In short, nothing he can do will help. He’s screwed if he is trying to > > look improving performance. > > > > > > On Jan 11, 2015, at 12:19 AM, Ted Yu <[email protected]> wrote: > > > > > Please see http://hbase.apache.org/book.html#perf.reading > > > > > > I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was > way > > > too old. > > > > > > bq. HBase has a heapsize of 1.5 Gigs > > > > > > This is not enough memory for good read performance. Please consider > > giving > > > HBase more heap. > > > > > > Cheers > > > > > > > > > On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson <[email protected]> > > wrote: > > > > > >> Hi HBase users, > > >> > > >> I'm working HBase for the first time and I'm trying to sort out a > > >> performance issue. HBase is the data store for a small, focused web > > crawl > > >> I'm performing with Apache Nutch. I'm running in pseudo-distributed > > mode, > > >> meaning that Nutch, HBase and Hadoop are all on the same machine. The > > >> machine's a few years old and has only 4 gigs of RAM - much smaller > than > > >> most HBase installs, I know. > > >> > > >> When I first start my HBase processes I get about 60 seconds of fast > > >> performance. Hbase reads quickly and uses a healthy portion CPU > cycles. > > >> After a minute or so, though, HBase slows dramatically. Reads sink to > a > > >> glacial pace, and the CPU sits mostly idle. > > >> > > >> I notice this pattern when I run Nutch - particularly during > read-heavy > > >> operations - but also when I run a simple row counter from the shell. > > >> > > >> At the moment " count 'my_table' " takes almost 4 hours to read > through > > 500 > > >> 000 rows. The reading is much faster at the start than the end. In > the > > >> first 30 seconds, HBase counts 37000 rows, but in the 30 seconds > between > > >> 8:00 and 8:30, only 1000 are counted. > > >> > > >> Looking through my Ganglia report I see a brief return to high > > performance > > >> around 3 hours into the count. I don't know what's causing this spike. > > >> > > >> > > >> Can anyone suggest what configuration parameters I should change to > > improve > > >> read performance? Or what reference materials I should consult to > > better > > >> understand the problem? Again, I'm totally new to HBase. > > >> > > >> I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5 > > Gigs. > > >> > > >> Here's a Ganglia report covering the 4 hours of " count 'my_table' ": > > >> http://imgur.com/Aa3eukZ > > >> > > >> Please let me know if I can provide any more information. > > >> > > >> Many thanks, > > >> > > >> > > >> Dave > > >> > > > > >
