Re: Reducing impact of compactions on read performance

James Baldassari Tue, 18 May 2010 11:29:28 -0700

On Tue, May 18, 2010 at 2:06 PM, Jean-Daniel Cryans <[email protected]>wrote:


> > Resending this to [email protected] because my mail to
> > [email protected] failed with "550 550 mail to [email protected]
> > accepted here (state 14)".  Is the reply-to getting set correctly?
>  Anyway,
> > responses inline...
>
> Yeah that's strange, I just saw it too. It's probably related to the
> fact that Apache infra is moving our mailing lists since we are now a
> top level project.
>
> >
> > Here is a region server log from yesterday: http://pastebin.com/5a04kZVj
> > Every time one of those compactions ran (around 1pm, 4pm, 6pm, etc.) our
> > read performance took a big hit.  BTW, is there a way I can tell by
> looking
> > at the logs whether a minor or major compaction is running?  Yes, we do
> see
> > lots of I/O wait (as high as 30-40% at times) when the compactions are
> > running and reads are slow.  Load averages during compactions can spike
> as
> > high as 60.
> >
>
> Yeah high IO wait will have a direct impact on read performance. Do
> you swap? How much heap was given to the RSs?
>

Region servers have 9GB heaps.  Swap is disabled on all region servers.


>
> I see that you're not running with DEBUG, only INFO, so we cannot see
> which type of compaction is going on.
>

OK, so major vs. minor compaction messages are logged at DEBUG.  Maybe the
next time we need to reboot the cluster I'll lower it.


>
> >
> > OK, I'll set up a cron to kick majors off when load is at its lowest.
>  Can't
> > hurt I suppose.
>
> It's probably the best for the moment.
>

I manually ran a major compaction around 2-3am this morning, and we haven't
had any compactions since then.  I guess running the major at an off-peak
time might have helped, so I'll definitely set up that cron.  Is there an
existing HBase script I can leverage to run a compaction via cron, or should
I just roll my own Ruby script?


>
> >> HBase limits the rate of inserts to not be overrun by WALs so that if
> >> a machine fails, you don't have to split GBs of files. What about
> >> inserting more slowly into your cluster? Flushes/compactions will be
> >> more spread over time?
> >>
> >> Disabling the WAL during your insert will make it a lot faster, not
> >> necessarily what you want here.
> >>
> >
> > Our inserts are already fairly fast.  I think we usually get around
> > 30,000/sec when we do these bulk imports.  I'm less concerned about
> insert
> > speed and more concerned about the impact to reads when we do the bulk
> > imports and a compaction is triggered.  Do you think it makes sense to
> > disable WAL for the bulk inserts in this case?  Would disabling WAL
> decrease
> > the number of compactions that are required?
>
> This is my point, try uploading slower. Disabling WAL, like I said,
> will speed up the upload since you don't write to WAL so compactions
> will happen even at a faster rate!
>

Thanks for the clarification.  It sounds like throttling the bulk updates
will help.


>
> >
> >
> > OK, I'm eagerly awaiting the next release.  Seems like there have been
> lots
> > of good improvements since 0.20.3!
>
> Lots of people working very hard :P
>
> >
> >
> >>
> >> >
> >> > Thanks,
> >> > James
> >> >
> >>
> >
>

Re: Reducing impact of compactions on read performance

Reply via email to