(Branching this discussion since it's not directly relevant to the other thread)

I think if we ever come up with a formula, it needs to come with a big
"your mileage may vary" sign. The reasons being:

 - If only a subset of the regions are getting written to, then only
those regions need to be accounted for (I think this is what you
referred to by Active Regions)
 - If the load is read heavy then you'd want to flush as little as
possible, meaning a very few regions (possibly forcing them to be less
than the theoretical maximum)
 - Not all tables may have the same flush size.
 - Some regions might be more active than others and may flush a lot
more, and since we keep both active and inactive data in the HLogs
then you might be churning more than you need to.
 - Same for families.

Now on the formula:

> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )

That's ok.

>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / 
> (2~3))

Could you explain the division by 2 or 3? I'm not sure I'm following
that. Also I don't remember if the flush size by region was fixed (it
should be by family), but this would have an effect too.

> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))

Same comments.

J-D

2011/9/6 Gaojinchao <[email protected]>:
> Hi J-D
> Should we can give a formula about active regions per node and up to book ?  
> I think many people encounter the same problem.
>
> I think the formula is:
> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )
>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / 
> (2~3))
> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))
>
>
> If I am wrong, please correct. Thanks.

Reply via email to