If you increase the region size to 2GB, then all regions (current and new) will avoid a split until their aggregate StoreFile size reaches that limit. Reorganizing the regions for a uniform growth pattern is really a schema design problem. There is the capability to merge two adjacent regions if you know that your data growth pattern is non-uniform. StumbleUpon & other companies have more experience with those utilities than I do.
Note: With the introduction of HFileV2 in 0.92, you'll definitely want to lean towards increasing the region size. HFile scalability code is more mature/stable than the region splitting code. Plus, automatic region splitting is harder to optimize & debug when failures occur. On 11/22/11 12:20 PM, "Srikanth P. Shreenivas" <[email protected]> wrote: >Thanks Nicolas for the clarification. I had a follow-up query. > >What will happen if we increased the region size, say from current value >of 256 MB to a new value of 2GB? >Will existing regions continue to use only 256 MB space? > >Is there a way to reorganize the regions so that each regions grows to >2GB size? > >Thanks, >Srikanth > >-----Original Message----- >From: Nicolas Spiegelberg [mailto:[email protected]] >Sent: Tuesday, November 22, 2011 10:59 PM >To: [email protected] >Subject: Re: Region Splits > >No. The purpose of major compactions is to merge & dedupe within a region >boundary. Compactions will not alter region boundaries, except in the >case of splits where a compaction is necessary to filter out any Rows from >the parent region that are no longer applicable to the daughter region. > >On 11/22/11 9:04 AM, "Srikanth P. Shreenivas" ><[email protected]> wrote: > >>Will major compactions take care of merging "older" regions or adding >>more key/values to them as number of regions grow? >> >>Regard, >>Srikanth >> >>-----Original Message----- >>From: Amandeep Khurana [mailto:[email protected]] >>Sent: Monday, November 21, 2011 7:25 AM >>To: [email protected] >>Subject: Re: Region Splits >> >>Mark, >> >>Yes, your understanding is correct. If your keys are sequential >>(timestamps >>etc), you will always be writing to the end of the table and "older" >>regions will not get any writes. This is one of the arguments against >>using >>sequential keys. >> >>-ak >> >>On Sun, Nov 20, 2011 at 11:33 AM, Mark <[email protected]> wrote: >> >>> Say we have a use case that has sequential row keys and we have rows >>> 0-100. Let's assume that 100 rows = the split size. Now when there is a >>> split it will split at the halfway mark so there will be two regions as >>> follows: >>> >>> Region1 [START-49] >>> Region2 [50-END] >>> >>> So now at this point all inserts will be writing to Region2 only >>>correct? >>> Now at some point Region2 will need to split and it will look like the >>> following before the split: >>> >>> Region1 [START-49] >>> Region2 [50-150] >>> >>> After the split it will look like: >>> >>> Region1 [START-49] >>> Region2 [50-100] >>> Region3 [150-END] >>> >>> And this pattern will continue correct? My question is when there is a >>>use >>> case that has sequential keys how would any of the older regions every >>> receive anymore writes? It seems like they would always be stuck at >>> MaxRegionSize/2. Can someone please confirm or clarify this issue? >>> >>> Thanks >>> >>> >>> >>> >>> >> >>________________________________ >> >>http://www.mindtree.com/email/disclaimer.html >
