I just thought of something.

In cases where the id is sequential couldn't one simply reverse the id to get more of a uniform distribution?

510911 => 119015
510912 => 219015
510913 => 319015
510914 => 419015

That seems like a reasonable alternative that doesn't require prefixing each row key with an extra 16 bytes. Am I wrong in thinking this could work?


On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
If you increase the region size to 2GB, then all regions (current and new)
will avoid a split until their aggregate StoreFile size reaches that
limit.  Reorganizing the regions for a uniform growth pattern is really a
schema design problem.  There is the capability to merge two adjacent
regions if you know that your data growth pattern is non-uniform.
StumbleUpon&  other companies have more experience with those utilities
than I do.

Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
lean towards increasing the region size.  HFile scalability code is more
mature/stable than the region splitting code.  Plus, automatic region
splitting is harder to optimize&  debug when failures occur.

On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
<srikanth_shreeni...@mindtree.com>  wrote:

Thanks Nicolas for the clarification.  I had a follow-up query.

What will happen if we increased the region size, say from current value
of 256 MB to a new value of 2GB?
Will existing regions continue to use only 256 MB space?

Is there a way to reorganize the regions so that each regions grows to
2GB size?

Thanks,
Srikanth

-----Original Message-----
From: Nicolas Spiegelberg [mailto:nspiegelb...@fb.com]
Sent: Tuesday, November 22, 2011 10:59 PM
To: user@hbase.apache.org
Subject: Re: Region Splits

No.  The purpose of major compactions is to merge&  dedupe within a region
boundary.  Compactions will not alter region boundaries, except in the
case of splits where a compaction is necessary to filter out any Rows from
the parent region that are no longer applicable to the daughter region.

On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
<srikanth_shreeni...@mindtree.com>  wrote:

Will major compactions take care of merging "older" regions or adding
more key/values to them as number of regions grow?

Regard,
Srikanth

-----Original Message-----
From: Amandeep Khurana [mailto:ama...@gmail.com]
Sent: Monday, November 21, 2011 7:25 AM
To: user@hbase.apache.org
Subject: Re: Region Splits

Mark,

Yes, your understanding is correct. If your keys are sequential
(timestamps
etc), you will always be writing to the end of the table and "older"
regions will not get any writes. This is one of the arguments against
using
sequential keys.

-ak

On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void....@gmail.com>  wrote:

Say we have a use case that has sequential row keys and we have rows
0-100. Let's assume that 100 rows = the split size. Now when there is a
split it will split at the halfway mark so there will be two regions as
follows:

Region1 [START-49]
Region2 [50-END]

So now at this point all inserts will be writing to Region2 only
correct?
Now at some point Region2 will need to split and it will look like the
following before the split:

Region1 [START-49]
Region2 [50-150]

After the split it will look like:

Region1 [START-49]
Region2 [50-100]
Region3 [150-END]

And this pattern will continue correct? My question is when there is a
use
case that has sequential keys how would any of the older regions every
receive anymore writes? It seems like they would always be stuck at
MaxRegionSize/2. Can someone please confirm or clarify this issue?

Thanks





________________________________

http://www.mindtree.com/email/disclaimer.html

Reply via email to