But the pre-compressed size is still the one that's using heap right? Same for space in the HLogs, so you shouldn't lower the impact of the flush size.
J-D On Thu, Sep 8, 2011 at 2:11 AM, Gaojinchao <[email protected]> wrote: > J-D: > Thanks a lot. You are right. > I may not take into account some factors. My case is writing heavy, So I > don't want to flush the little file. > > "2 or 3"is a experience value that means the smallest memstore should be. > eg: if flush.size = 128M, the hfile size is 128M/3/ compression ratio, > probably more than ten megabytes that is very little than region size(that is > 1 G or more). > About my case , I want to reduce pressure of compaction(that is only one > thread) > > > > -----邮件原件----- > 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans > 发送时间: 2011年9月8日 2:13 > 收件人: [email protected] > 主题: Calculating the optimal number of regions (WAS -> Re: big compaction > queue size) > > (Branching this discussion since it's not directly relevant to the other > thread) > > I think if we ever come up with a formula, it needs to come with a big > "your mileage may vary" sign. The reasons being: > > - If only a subset of the regions are getting written to, then only > those regions need to be accounted for (I think this is what you > referred to by Active Regions) > - If the load is read heavy then you'd want to flush as little as > possible, meaning a very few regions (possibly forcing them to be less > than the theoretical maximum) > - Not all tables may have the same flush size. > - Some regions might be more active than others and may flush a lot > more, and since we keep both active and inactive data in the HLogs > then you might be churning more than you need to. > - Same for families. > > Now on the formula: > >> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) ) > > That's ok. > >> Active Regions = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / >> (2~3)) > > Could you explain the division by 2 or 3? I'm not sure I'm following > that. Also I don't remember if the flush size by region was fixed (it > should be by family), but this would have an effect too. > >> Else >> Active Regions = (Hlognumber*hdfsblock)/ (flush.size / (2~3)) > > Same comments. > > J-D > > 2011/9/6 Gaojinchao <[email protected]>: >> Hi J-D >> Should we can give a formula about active regions per node and up to book ? >> I think many people encounter the same problem. >> >> I think the formula is: >> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) ) >> Active Regions = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / >> (2~3)) >> Else >> Active Regions = (Hlognumber*hdfsblock)/ (flush.size / (2~3)) >> >> >> If I am wrong, please correct. Thanks. >
