Re: Region Splits

Nicolas Spiegelberg Tue, 22 Nov 2011 12:47:10 -0800

If you increase the region size to 2GB, then all regions (current and new)
will avoid a split until their aggregate StoreFile size reaches that
limit.  Reorganizing the regions for a uniform growth pattern is really a
schema design problem.  There is the capability to merge two adjacent
regions if you know that your data growth pattern is non-uniform.
StumbleUpon & other companies have more experience with those utilities
than I do.


Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
lean towards increasing the region size.  HFile scalability code is more
mature/stable than the region splitting code.  Plus, automatic region
splitting is harder to optimize & debug when failures occur.

On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
<[email protected]> wrote:

>Thanks Nicolas for the clarification.  I had a follow-up query.
>
>What will happen if we increased the region size, say from current value
>of 256 MB to a new value of 2GB?
>Will existing regions continue to use only 256 MB space?
>
>Is there a way to reorganize the regions so that each regions grows to
>2GB size?
>
>Thanks,
>Srikanth
>
>-----Original Message-----
>From: Nicolas Spiegelberg [mailto:[email protected]]
>Sent: Tuesday, November 22, 2011 10:59 PM
>To: [email protected]
>Subject: Re: Region Splits
>
>No.  The purpose of major compactions is to merge & dedupe within a region
>boundary.  Compactions will not alter region boundaries, except in the
>case of splits where a compaction is necessary to filter out any Rows from
>the parent region that are no longer applicable to the daughter region.
>
>On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
><[email protected]> wrote:
>
>>Will major compactions take care of merging "older" regions or adding
>>more key/values to them as number of regions grow?
>>
>>Regard,
>>Srikanth
>>
>>-----Original Message-----
>>From: Amandeep Khurana [mailto:[email protected]]
>>Sent: Monday, November 21, 2011 7:25 AM
>>To: [email protected]
>>Subject: Re: Region Splits
>>
>>Mark,
>>
>>Yes, your understanding is correct. If your keys are sequential
>>(timestamps
>>etc), you will always be writing to the end of the table and "older"
>>regions will not get any writes. This is one of the arguments against
>>using
>>sequential keys.
>>
>>-ak
>>
>>On Sun, Nov 20, 2011 at 11:33 AM, Mark <[email protected]> wrote:
>>
>>> Say we have a use case that has sequential row keys and we have rows
>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>> split it will split at the halfway mark so there will be two regions as
>>> follows:
>>>
>>> Region1 [START-49]
>>> Region2 [50-END]
>>>
>>> So now at this point all inserts will be writing to Region2 only
>>>correct?
>>> Now at some point Region2 will need to split and it will look like the
>>> following before the split:
>>>
>>> Region1 [START-49]
>>> Region2 [50-150]
>>>
>>> After the split it will look like:
>>>
>>> Region1 [START-49]
>>> Region2 [50-100]
>>> Region3 [150-END]
>>>
>>> And this pattern will continue correct? My question is when there is a
>>>use
>>> case that has sequential keys how would any of the older regions every
>>> receive anymore writes? It seems like they would always be stuck at
>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>
>>________________________________
>>
>>http://www.mindtree.com/email/disclaimer.html
>

Re: Region Splits

Reply via email to