Re: Reducing impact of compactions on read performance

Jean-Daniel Cryans Tue, 18 May 2010 11:38:39 -0700

To run major compactions from a shell script:

echo "major_compact 'table_name' " | /path/to/hbase/dir/bin/hbase shell


J-D

On Tue, May 18, 2010 at 2:29 PM, James Baldassari <[email protected]> wrote:
> On Tue, May 18, 2010 at 2:06 PM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> > Resending this to [email protected] because my mail to
>> > [email protected] failed with "550 550 mail to [email protected]
>> > accepted here (state 14)".  Is the reply-to getting set correctly?
>>  Anyway,
>> > responses inline...
>>
>> Yeah that's strange, I just saw it too. It's probably related to the
>> fact that Apache infra is moving our mailing lists since we are now a
>> top level project.
>>
>> >
>> > Here is a region server log from yesterday: http://pastebin.com/5a04kZVj
>> > Every time one of those compactions ran (around 1pm, 4pm, 6pm, etc.) our
>> > read performance took a big hit.  BTW, is there a way I can tell by
>> looking
>> > at the logs whether a minor or major compaction is running?  Yes, we do
>> see
>> > lots of I/O wait (as high as 30-40% at times) when the compactions are
>> > running and reads are slow.  Load averages during compactions can spike
>> as
>> > high as 60.
>> >
>>
>> Yeah high IO wait will have a direct impact on read performance. Do
>> you swap? How much heap was given to the RSs?
>>
>
> Region servers have 9GB heaps.  Swap is disabled on all region servers.
>
>
>>
>> I see that you're not running with DEBUG, only INFO, so we cannot see
>> which type of compaction is going on.
>>
>
> OK, so major vs. minor compaction messages are logged at DEBUG.  Maybe the
> next time we need to reboot the cluster I'll lower it.
>
>
>>
>> >
>> > OK, I'll set up a cron to kick majors off when load is at its lowest.
>>  Can't
>> > hurt I suppose.
>>
>> It's probably the best for the moment.
>>
>
> I manually ran a major compaction around 2-3am this morning, and we haven't
> had any compactions since then.  I guess running the major at an off-peak
> time might have helped, so I'll definitely set up that cron.  Is there an
> existing HBase script I can leverage to run a compaction via cron, or should
> I just roll my own Ruby script?
>
>
>>
>> >> HBase limits the rate of inserts to not be overrun by WALs so that if
>> >> a machine fails, you don't have to split GBs of files. What about
>> >> inserting more slowly into your cluster? Flushes/compactions will be
>> >> more spread over time?
>> >>
>> >> Disabling the WAL during your insert will make it a lot faster, not
>> >> necessarily what you want here.
>> >>
>> >
>> > Our inserts are already fairly fast.  I think we usually get around
>> > 30,000/sec when we do these bulk imports.  I'm less concerned about
>> insert
>> > speed and more concerned about the impact to reads when we do the bulk
>> > imports and a compaction is triggered.  Do you think it makes sense to
>> > disable WAL for the bulk inserts in this case?  Would disabling WAL
>> decrease
>> > the number of compactions that are required?
>>
>> This is my point, try uploading slower. Disabling WAL, like I said,
>> will speed up the upload since you don't write to WAL so compactions
>> will happen even at a faster rate!
>>
>
> Thanks for the clarification.  It sounds like throttling the bulk updates
> will help.
>
>
>>
>> >
>> >
>> > OK, I'm eagerly awaiting the next release.  Seems like there have been
>> lots
>> > of good improvements since 0.20.3!
>>
>> Lots of people working very hard :P
>>
>> >
>> >
>> >>
>> >> >
>> >> > Thanks,
>> >> > James
>> >> >
>> >>
>> >
>>
>

Re: Reducing impact of compactions on read performance

Reply via email to