Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Dave Benson Sun, 11 Jan 2015 13:39:06 -0800

St.Ack, Michael and Ted - thanks for your responses.


On Sun, Jan 11, 2015 at 1:38 PM, Stack <[email protected]> wrote:

> Dave:
>
> As Michael suggests, you seem to be swapping going by your ganglia graph
> (the purple squiggles that often go above the 4G mark in the top right-hand
> memory graph).  Swapping will put a stake in your throughput.  Try lowering
> thresholds so you are not swapping.  Stuff should run a little smoother.
> See http://hbase.apache.org/book.html#perf.os.swap


St.Ack - I hope you can forgive the question, but which "thresholds" should
I be lowering, exactly? Do you mean that I should decrease the HBase
heapsize until RAM usage stays below 4G?

Thanks,


Dave





>
>
> St.Ack
>
> On Sun, Jan 11, 2015 at 6:49 AM, Michael Segel <[email protected]>
> wrote:
>
> > @Ted,
> > Pseudo cluster on a machine that has 4GB of memory.
> > If you give HBase 1.5GB for the region server… you are left with 2.5 GB
> of
> > memory for everything else.
> > You will swap.
> >
> > In short, nothing he can do will help. He’s screwed if he is trying to
> > look improving performance.
> >
> >
> > On Jan 11, 2015, at 12:19 AM, Ted Yu <[email protected]> wrote:
> >
> > > Please see http://hbase.apache.org/book.html#perf.reading
> > >
> > > I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was
> way
> > > too old.
> > >
> > > bq. HBase has a heapsize of 1.5 Gigs
> > >
> > > This is not enough memory for good read performance. Please consider
> > giving
> > > HBase more heap.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson <[email protected]>
> > wrote:
> > >
> > >> Hi HBase users,
> > >>
> > >> I'm working HBase for the first time and I'm trying to sort out a
> > >> performance issue. HBase is the data store for a small, focused web
> > crawl
> > >> I'm performing with Apache Nutch. I'm running in pseudo-distributed
> > mode,
> > >> meaning that Nutch, HBase and Hadoop are all on the same machine. The
> > >> machine's a few years old and has only 4 gigs of RAM - much smaller
> than
> > >> most HBase installs, I know.
> > >>
> > >> When I first start my HBase processes I get about 60 seconds of fast
> > >> performance. Hbase reads quickly and uses a healthy portion CPU
> cycles.
> > >> After a minute or so, though, HBase slows dramatically. Reads sink to
> a
> > >> glacial pace, and the CPU sits mostly idle.
> > >>
> > >> I notice this pattern when I run Nutch - particularly during
> read-heavy
> > >> operations - but also when I run a simple row counter from the shell.
> > >>
> > >> At the moment " count 'my_table' " takes almost 4 hours to read
> through
> > 500
> > >> 000 rows. The reading is much faster at the start than the end.  In
> the
> > >> first 30 seconds, HBase counts 37000 rows, but in the 30 seconds
> between
> > >> 8:00 and 8:30, only 1000 are counted.
> > >>
> > >> Looking through my Ganglia report I see a brief return to high
> > performance
> > >> around 3 hours into the count. I don't know what's causing this spike.
> > >>
> > >>
> > >> Can anyone suggest what configuration parameters I should change to
> > improve
> > >> read performance?  Or what reference materials I should consult to
> > better
> > >> understand the problem?  Again, I'm totally new to HBase.
> > >>
> > >> I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5
> > Gigs.
> > >>
> > >> Here's a Ganglia report covering the 4 hours of " count 'my_table' ":
> > >> http://imgur.com/Aa3eukZ
> > >>
> > >> Please let me know if I can provide any more information.
> > >>
> > >> Many thanks,
> > >>
> > >>
> > >> Dave
> > >>
> >
> >
>

Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Reply via email to