Hi J-D,
If 1GB region size and about 1K regions per server is recommended, does
that mean that a region node should server about 1TB of compressed data
at most?
If that is the case than having more than 2TB (1TB for data and 1TB of
spare free space) is wasteful for data nodes that are part of
"non-analytical" clusters.
Thanks,
i.
On 12/15/2010 10:17 AM, Jean-Daniel Cryans wrote:
We can give it a try. Currently we use 512 MiB per region, is there any
upper bound for this value which is not recommended to cross?
Like I said in my first email, we recommend 1GB.
Are there any
side-effects we may expect when we set this value to say 1 GiB?
HBase may be faster overtime since less splitting occurs. Splitting is
good because it basically shards the data, but you don't want that to
happen too often past a certain point. Some people even prefer to
pre-split their tables and then disable splitting since it's known to
have a few deficiencies... but that's not optimal either. Recently I
worked on HBASE-3308 to speed up one part of the splitting process.
I suppose at
least a bit longer random gets?
I see no reason why it would do that.
J-D