"For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb. For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb)." ... from http://hbase.apache.org/book.html ...
Unfortunately, I cannot upgrade to 0.92.x/cdh4 right away and have limited hardware for some time, so want to understand the reasoning behind the HFile limit in v1 so that I can weigh my options. (make use of bigger region size, increase number of regions/regionserver, wait for more nodes and spread) - why is the limit 4GB in 0.90.x? - if it is a hard limit... would like to hear experiences from people about: -- what is the normal region size? -- has anyone been running with 3-4G region sizes. I do understand compactions will take longer, index size can be big depending on key size, read performance can be impacted, may be region splitting will be a problem... what else should I be worried about? I have pre-split regions with known bounds, lots of RAM on each node (96G), no swap, controlled compactions, no MR on this etc... so I believe I have optimal set-up. Thanks.
