Re: About HBase Memstore Flushes

Alex Baranau Wed, 23 May 2012 14:33:45 -0700

Talked to J-D (and source code). It turned out that
when hbase.regionserver.global.memstore.lowerLimit is reached flushes are
forced without blocking reads (of course,
if hbase.regionserver.global.memstore.upperLimit is not hit). Makes perfect
sense. Though couldn't figure this out from settings description in
hbase-default.xml (tried to come up with the patch:
https://issues.apache.org/jira/browse/HBASE-6076).


So (if one is interested), the logic is the following with regard to
triggering flushes on the "global regionserver level":
* flushes are forced when memstore size
hits hbase.regionserver.global.memstore.lowerLimit
* flushes are forced *and updates are blocked* when memstore size
reaches hbase.regionserver.global.memstore.upperLimit. In this case flushes
are forced and updates are blocked until memstore size is less
than hbase.regionserver.global.memstore.lowerLimit.

Not sure if that would make sense to separate these two things though:
* mark until memstore flushes are forced and updates are blocked
* mark when memstore flushes are forced (without blocking updates)
As for now for two these
things hbase.regionserver.global.memstore.lowerLimit is used.

Alex Baranau
------
Sematext :: http://blog.sematext.com/

On Wed, May 9, 2012 at 6:02 PM, Alex Baranau <alex.barano...@gmail.com>wrote:

> Should I may be create a JIRA issue for that?
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/
>
> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <alex.barano...@gmail.com>wrote:
>
>> Hi!
>>
>> Just trying to check that I understand things correctly about configuring
>> memstore flushes.
>>
>> Basically, there are two groups of configuraion properties (leaving out
>> region pre-close flushes):
>> 1. determines when flush should be triggered
>> 2. determines when flush should be triggered and updates should be
>> blocked during flushing
>>
>> 2nd one is for safety reasons: we don't want memstore to grow without a
>> limit, so we forbid writes unless memstore has "bearable" size. Also we
>> don't want flushed files to be too big. These properties are:
>> * hbase.regionserver.global.memstore.upperLimit &
>> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
>> * hbase.hregion.memstore.block.multiplier [2]
>>
>> 1st group (sorry for reverse order) is about triggering "regular
>> flushes". As flushes can be performed without pausing updates, we want them
>> to happen before conditions for "blocking updates" flushes are met. The
>> property for configuring this is
>> * hbase.hregion.memstore.flush.size [3]
>> (* there are also open jira issues for per colfam settings)
>>
>> As we don't want to perform too frequent flushes, we want to keep this
>> option big enough to avoid that. At the same time we want to keep it small
>> enough so that it triggers flushing *before* the "blocking updates"
>> flushing is triggered. This configuration is per-region, while (1) is per
>> regionserver. So, if we had constant (more or less) number of regions per
>> regionserver, we could choose the value in a such way that it is not too
>> small, but small enough. However it is usual situation when regions number
>> assigned to regionserver varies a lot during cluster life. And we don't
>> want to adjust it over time (which requires RSs restarts).
>>
>> Does thinking above make sense to you? If yes, then here are the
>> questions:
>>
>> A. is it a goal to have more or less constant regions number per
>> regionserver? Can anyone share their experience if that is achievable?
>> B. or should there be any config options for setting up triggering
>> flushes based on regionserver state (not just individual regions or
>> stores)? E.g.:
>>     B.1 given setting X%, trigger flush of biggest memstore (or whatever
>> is logic for selecting memstore to flush) when memstore takes up X% of heap
>> (similar to (1), but triggers flushing when there's no need to block
>> updates yet)
>>     B.2 any other which takes into account regions number
>>
>> Thoughts?
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/
>>
>> [1]
>>
>>   <property>
>>     <name>hbase.regionserver.global.memstore.upperLimit</name>
>>     <value>0.4</value>
>>     <description>Maximum size of all memstores in a region server before
>> new
>>       updates are blocked and flushes are forced. Defaults to 40% of heap
>>     </description>
>>   </property>
>>   <property>
>>     <name>hbase.regionserver.global.memstore.lowerLimit</name>
>>     <value>0.35</value>
>>     <description>When memstores are being forced to flush to make room in
>>       memory, keep flushing until we hit this mark. Defaults to 35% of
>> heap.
>>       This value equal to hbase.regionserver.global.memstore.upperLimit
>> causes
>>       the minimum possible flushing to occur when updates are blocked due
>> to
>>       memstore limiting.
>>     </description>
>>   </property>
>>
>> [2]
>>
>>   <property>
>>     <name>hbase.hregion.memstore.block.multiplier</name>
>>     <value>2</value>
>>     <description>
>>     Block updates if memstore has hbase.hregion.block.memstore
>>     time hbase.hregion.flush.size bytes.  Useful preventing
>>     runaway memstore during spikes in update traffic.  Without an
>>     upper-bound, memstore fills such that when it flushes the
>>     resultant flush files take a long time to compact or split, or
>>     worse, we OOME.
>>     </description>
>>   </property>
>>
>> [3]
>>
>>   <property>
>>     <name>hbase.hregion.memstore.flush.size</name>
>>     <value>134217728</value>
>>     <description>
>>     Memstore will be flushed to disk if size of the memstore
>>     exceeds this number of bytes.  Value is checked by a thread that runs
>>     every hbase.server.thread.wakefrequency.
>>     </description>
>>   </property>
>>
>
>

Re: About HBase Memstore Flushes

Reply via email to