Re: About HBase Memstore Flushes

Alex Baranau Wed, 09 May 2012 15:03:04 -0700

Should I may be create a JIRA issue for that?

Alex Baranau
------
Sematext :: http://blog.sematext.com/


On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[email protected]>wrote:

> Hi!
>
> Just trying to check that I understand things correctly about configuring
> memstore flushes.
>
> Basically, there are two groups of configuraion properties (leaving out
> region pre-close flushes):
> 1. determines when flush should be triggered
> 2. determines when flush should be triggered and updates should be blocked
> during flushing
>
> 2nd one is for safety reasons: we don't want memstore to grow without a
> limit, so we forbid writes unless memstore has "bearable" size. Also we
> don't want flushed files to be too big. These properties are:
> * hbase.regionserver.global.memstore.upperLimit &
> hbase.regionserver.global.memstore.lowerLimit [1]   (1)
> * hbase.hregion.memstore.block.multiplier [2]
>
> 1st group (sorry for reverse order) is about triggering "regular flushes".
> As flushes can be performed without pausing updates, we want them to happen
> before conditions for "blocking updates" flushes are met. The property for
> configuring this is
> * hbase.hregion.memstore.flush.size [3]
> (* there are also open jira issues for per colfam settings)
>
> As we don't want to perform too frequent flushes, we want to keep this
> option big enough to avoid that. At the same time we want to keep it small
> enough so that it triggers flushing *before* the "blocking updates"
> flushing is triggered. This configuration is per-region, while (1) is per
> regionserver. So, if we had constant (more or less) number of regions per
> regionserver, we could choose the value in a such way that it is not too
> small, but small enough. However it is usual situation when regions number
> assigned to regionserver varies a lot during cluster life. And we don't
> want to adjust it over time (which requires RSs restarts).
>
> Does thinking above make sense to you? If yes, then here are the questions:
>
> A. is it a goal to have more or less constant regions number per
> regionserver? Can anyone share their experience if that is achievable?
> B. or should there be any config options for setting up triggering flushes
> based on regionserver state (not just individual regions or stores)? E.g.:
>     B.1 given setting X%, trigger flush of biggest memstore (or whatever
> is logic for selecting memstore to flush) when memstore takes up X% of heap
> (similar to (1), but triggers flushing when there's no need to block
> updates yet)
>     B.2 any other which takes into account regions number
>
> Thoughts?
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/
>
> [1]
>
>   <property>
>     <name>hbase.regionserver.global.memstore.upperLimit</name>
>     <value>0.4</value>
>     <description>Maximum size of all memstores in a region server before
> new
>       updates are blocked and flushes are forced. Defaults to 40% of heap
>     </description>
>   </property>
>   <property>
>     <name>hbase.regionserver.global.memstore.lowerLimit</name>
>     <value>0.35</value>
>     <description>When memstores are being forced to flush to make room in
>       memory, keep flushing until we hit this mark. Defaults to 35% of
> heap.
>       This value equal to hbase.regionserver.global.memstore.upperLimit
> causes
>       the minimum possible flushing to occur when updates are blocked due
> to
>       memstore limiting.
>     </description>
>   </property>
>
> [2]
>
>   <property>
>     <name>hbase.hregion.memstore.block.multiplier</name>
>     <value>2</value>
>     <description>
>     Block updates if memstore has hbase.hregion.block.memstore
>     time hbase.hregion.flush.size bytes.  Useful preventing
>     runaway memstore during spikes in update traffic.  Without an
>     upper-bound, memstore fills such that when it flushes the
>     resultant flush files take a long time to compact or split, or
>     worse, we OOME.
>     </description>
>   </property>
>
> [3]
>
>   <property>
>     <name>hbase.hregion.memstore.flush.size</name>
>     <value>134217728</value>
>     <description>
>     Memstore will be flushed to disk if size of the memstore
>     exceeds this number of bytes.  Value is checked by a thread that runs
>     every hbase.server.thread.wakefrequency.
>     </description>
>   </property>
>

Re: About HBase Memstore Flushes

Reply via email to