J-D: 
Thanks a lot. You are right.
I may not take into account some factors. My case is writing heavy, So I don't 
want to flush the little file.

"2 or 3"is a experience value that means the smallest memstore should be.
eg: if flush.size = 128M,  the hfile size is 128M/3/ compression ratio, 
probably more than ten megabytes that is very little than region size(that is 1 
G or more).
About my case , I want to reduce pressure of compaction(that is only one thread)



-----邮件原件-----
发件人: [email protected] [mailto:[email protected]] 代表 Jean-Daniel Cryans
发送时间: 2011年9月8日 2:13
收件人: [email protected]
主题: Calculating the optimal number of regions (WAS -> Re: big compaction queue 
size)

(Branching this discussion since it's not directly relevant to the other thread)

I think if we ever come up with a formula, it needs to come with a big
"your mileage may vary" sign. The reasons being:

 - If only a subset of the regions are getting written to, then only
those regions need to be accounted for (I think this is what you
referred to by Active Regions)
 - If the load is read heavy then you'd want to flush as little as
possible, meaning a very few regions (possibly forcing them to be less
than the theoretical maximum)
 - Not all tables may have the same flush size.
 - Some regions might be more active than others and may flush a lot
more, and since we keep both active and inactive data in the HLogs
then you might be churning more than you need to.
 - Same for families.

Now on the formula:

> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )

That's ok.

>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / 
> (2~3))

Could you explain the division by 2 or 3? I'm not sure I'm following
that. Also I don't remember if the flush size by region was fixed (it
should be by family), but this would have an effect too.

> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))

Same comments.

J-D

2011/9/6 Gaojinchao <[email protected]>:
> Hi J-D
> Should we can give a formula about active regions per node and up to book ?  
> I think many people encounter the same problem.
>
> I think the formula is:
> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )
>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / 
> (2~3))
> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))
>
>
> If I am wrong, please correct. Thanks.

Reply via email to