Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Stack Sun, 11 Jan 2015 10:39:06 -0800

Dave:

As Michael suggests, you seem to be swapping going by your ganglia graph
(the purple squiggles that often go above the 4G mark in the top right-hand
memory graph).  Swapping will put a stake in your throughput.  Try lowering
thresholds so you are not swapping.  Stuff should run a little smoother.
See http://hbase.apache.org/book.html#perf.os.swap


St.Ack

On Sun, Jan 11, 2015 at 6:49 AM, Michael Segel <[email protected]>
wrote:

> @Ted,
> Pseudo cluster on a machine that has 4GB of memory.
> If you give HBase 1.5GB for the region server… you are left with 2.5 GB of
> memory for everything else.
> You will swap.
>
> In short, nothing he can do will help. He’s screwed if he is trying to
> look improving performance.
>
>
> On Jan 11, 2015, at 12:19 AM, Ted Yu <[email protected]> wrote:
>
> > Please see http://hbase.apache.org/book.html#perf.reading
> >
> > I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was way
> > too old.
> >
> > bq. HBase has a heapsize of 1.5 Gigs
> >
> > This is not enough memory for good read performance. Please consider
> giving
> > HBase more heap.
> >
> > Cheers
> >
> >
> > On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson <[email protected]>
> wrote:
> >
> >> Hi HBase users,
> >>
> >> I'm working HBase for the first time and I'm trying to sort out a
> >> performance issue. HBase is the data store for a small, focused web
> crawl
> >> I'm performing with Apache Nutch. I'm running in pseudo-distributed
> mode,
> >> meaning that Nutch, HBase and Hadoop are all on the same machine. The
> >> machine's a few years old and has only 4 gigs of RAM - much smaller than
> >> most HBase installs, I know.
> >>
> >> When I first start my HBase processes I get about 60 seconds of fast
> >> performance. Hbase reads quickly and uses a healthy portion CPU cycles.
> >> After a minute or so, though, HBase slows dramatically. Reads sink to a
> >> glacial pace, and the CPU sits mostly idle.
> >>
> >> I notice this pattern when I run Nutch - particularly during read-heavy
> >> operations - but also when I run a simple row counter from the shell.
> >>
> >> At the moment " count 'my_table' " takes almost 4 hours to read through
> 500
> >> 000 rows. The reading is much faster at the start than the end.  In the
> >> first 30 seconds, HBase counts 37000 rows, but in the 30 seconds between
> >> 8:00 and 8:30, only 1000 are counted.
> >>
> >> Looking through my Ganglia report I see a brief return to high
> performance
> >> around 3 hours into the count. I don't know what's causing this spike.
> >>
> >>
> >> Can anyone suggest what configuration parameters I should change to
> improve
> >> read performance?  Or what reference materials I should consult to
> better
> >> understand the problem?  Again, I'm totally new to HBase.
> >>
> >> I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5
> Gigs.
> >>
> >> Here's a Ganglia report covering the 4 hours of " count 'my_table' ":
> >> http://imgur.com/Aa3eukZ
> >>
> >> Please let me know if I can provide any more information.
> >>
> >> Many thanks,
> >>
> >>
> >> Dave
> >>
>
>

Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?

Reply via email to