J-D: Thanks a lot. You are right. I may not take into account some factors. My case is writing heavy, So I don't want to flush the little file.
"2 or 3"is a experience value that means the smallest memstore should be. eg: if flush.size = 128M, the hfile size is 128M/3/ compression ratio, probably more than ten megabytes that is very little than region size(that is 1 G or more). About my case , I want to reduce pressure of compaction(that is only one thread) -----邮件原件----- 发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans 发送时间: 2011年9月8日 2:13 收件人: [email protected] 主题: Calculating the optimal number of regions (WAS -> Re: big compaction queue size) (Branching this discussion since it's not directly relevant to the other thread) I think if we ever come up with a formula, it needs to come with a big "your mileage may vary" sign. The reasons being: - If only a subset of the regions are getting written to, then only those regions need to be accounted for (I think this is what you referred to by Active Regions) - If the load is read heavy then you'd want to flush as little as possible, meaning a very few regions (possibly forcing them to be less than the theoretical maximum) - Not all tables may have the same flush size. - Some regions might be more active than others and may flush a lot more, and since we keep both active and inactive data in the HLogs then you might be churning more than you need to. - Same for families. Now on the formula: > If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) ) That's ok. > Active Regions = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / > (2~3)) Could you explain the division by 2 or 3? I'm not sure I'm following that. Also I don't remember if the flush size by region was fixed (it should be by family), but this would have an effect too. > Else > Active Regions = (Hlognumber*hdfsblock)/ (flush.size / (2~3)) Same comments. J-D 2011/9/6 Gaojinchao <[email protected]>: > Hi J-D > Should we can give a formula about active regions per node and up to book ? > I think many people encounter the same problem. > > I think the formula is: > If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) ) > Active Regions = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / > (2~3)) > Else > Active Regions = (Hlognumber*hdfsblock)/ (flush.size / (2~3)) > > > If I am wrong, please correct. Thanks.
