The number of regions per RS has always been a good point of debate. There's a max number of 1500 (hardcoded) however, you'll see performance degrade before that limit.
I've tried to set a goal of keeping the number of regions per RS down around 500-600 because I didn't have time to monitor the system that closely. (Again this was an R&D machine where if we lost it, or it wasn't at 100% peak, I wasn't going to get tarred and feathered. :-P ) So if you increase your Heap, monitor your # of regions, and increase region size as needed, you should be ok. On a side note... is there any correlation of the underlying block size to the region size in terms of performance? I never had time to check it out. Thx -Mike On May 18, 2012, at 9:05 PM, Otis Gospodnetic wrote: > I have a feeling Alex is raising an important issue, but maybe it's not > getting attention because it's tl;dr? > > Andy Purtell just wrote something very related in a different thread: > >> "The amount of heap alloted for memstore is fixed by configuration. > >> HBase maintains this global limit as part of a strategy to avoid out >> of memory conditions. Therefore, as the number of regions grow, the >> available space for each region's memstore shrinks proportionally. If >> you have a heap sized too small for region hosting demand, then when >> the number of regions gets up there, HBase will be flushing constantly >> tiny files and compacting endlessly." > > So isn't the above a problem for anyone using HBase? More precisely, this > part: > "...when the number of regions gets up there, HBase will be flushing > constantly tiny files and compacting endlessly." > > If this is not a problem, how do people work around this? Somehow keep the > number of regions mostly constant, or...? > > > Thanks! > > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm > > > >> ________________________________ >> From: Alex Baranau <[email protected]> >> To: [email protected]; [email protected] >> Sent: Wednesday, May 9, 2012 6:02 PM >> Subject: Re: About HBase Memstore Flushes >> >> Should I may be create a JIRA issue for that? >> >> Alex Baranau >> ------ >> Sematext :: http://blog.sematext.com/ >> >> On Tue, May 8, 2012 at 4:00 PM, Alex Baranau <[email protected]>wrote: >> >>> Hi! >>> >>> Just trying to check that I understand things correctly about configuring >>> memstore flushes. >>> >>> Basically, there are two groups of configuraion properties (leaving out >>> region pre-close flushes): >>> 1. determines when flush should be triggered >>> 2. determines when flush should be triggered and updates should be blocked >>> during flushing >>> >>> 2nd one is for safety reasons: we don't want memstore to grow without a >>> limit, so we forbid writes unless memstore has "bearable" size. Also we >>> don't want flushed files to be too big. These properties are: >>> * hbase.regionserver.global.memstore.upperLimit & >>> hbase.regionserver.global.memstore.lowerLimit [1] (1) >>> * hbase.hregion.memstore.block.multiplier [2] >>> >>> 1st group (sorry for reverse order) is about triggering "regular flushes". >>> As flushes can be performed without pausing updates, we want them to happen >>> before conditions for "blocking updates" flushes are met. The property for >>> configuring this is >>> * hbase.hregion.memstore.flush.size [3] >>> (* there are also open jira issues for per colfam settings) >>> >>> As we don't want to perform too frequent flushes, we want to keep this >>> option big enough to avoid that. At the same time we want to keep it small >>> enough so that it triggers flushing *before* the "blocking updates" >>> flushing is triggered. This configuration is per-region, while (1) is per >>> regionserver. So, if we had constant (more or less) number of regions per >>> regionserver, we could choose the value in a such way that it is not too >>> small, but small enough. However it is usual situation when regions number >>> assigned to regionserver varies a lot during cluster life. And we don't >>> want to adjust it over time (which requires RSs restarts). >>> >>> Does thinking above make sense to you? If yes, then here are the questions: >>> >>> A. is it a goal to have more or less constant regions number per >>> regionserver? Can anyone share their experience if that is achievable? >>> B. or should there be any config options for setting up triggering flushes >>> based on regionserver state (not just individual regions or stores)? E.g.: >>> B.1 given setting X%, trigger flush of biggest memstore (or whatever >>> is logic for selecting memstore to flush) when memstore takes up X% of heap >>> (similar to (1), but triggers flushing when there's no need to block >>> updates yet) >>> B.2 any other which takes into account regions number >>> >>> Thoughts? >>> >>> Alex Baranau >>> ------ >>> Sematext :: http://blog.sematext.com/ >>> >>> [1] >>> >>> <property> >>> <name>hbase.regionserver.global.memstore.upperLimit</name> >>> <value>0.4</value> >>> <description>Maximum size of all memstores in a region server before >>> new >>> updates are blocked and flushes are forced. Defaults to 40% of heap >>> </description> >>> </property> >>> <property> >>> <name>hbase.regionserver.global.memstore.lowerLimit</name> >>> <value>0.35</value> >>> <description>When memstores are being forced to flush to make room in >>> memory, keep flushing until we hit this mark. Defaults to 35% of >>> heap. >>> This value equal to hbase.regionserver.global.memstore.upperLimit >>> causes >>> the minimum possible flushing to occur when updates are blocked due >>> to >>> memstore limiting. >>> </description> >>> </property> >>> >>> [2] >>> >>> <property> >>> <name>hbase.hregion.memstore.block.multiplier</name> >>> <value>2</value> >>> <description> >>> Block updates if memstore has hbase.hregion.block.memstore >>> time hbase.hregion.flush.size bytes. Useful preventing >>> runaway memstore during spikes in update traffic. Without an >>> upper-bound, memstore fills such that when it flushes the >>> resultant flush files take a long time to compact or split, or >>> worse, we OOME. >>> </description> >>> </property> >>> >>> [3] >>> >>> <property> >>> <name>hbase.hregion.memstore.flush.size</name> >>> <value>134217728</value> >>> <description> >>> Memstore will be flushed to disk if size of the memstore >>> exceeds this number of bytes. Value is checked by a thread that runs >>> every hbase.server.thread.wakefrequency. >>> </description> >>> </property> >>> >> >>
