Mark: you are correct about the old_key suffix.  I'm assuming that you're
worried about this because of keyspace size, correct?  The default
algorithm for pre-splitting assumes a 32-bit (4 byte) hash prefix, which
should be perfectly scalable for all use cases in the near future of
computing.  Really, you could get away with an 8-bit hash prefix if your
cluster is small & you plan to auto-split after a certain size.  This is
available if you use UniformSplit but will require a little power user
investigation.  I don't think anybody deviates from the default, mainly
just because current use cases aren't as finicky about the extra overhead.

For the medium term, note that HBASE-4218 will also introduce key
compression & further reduce overhead.  This won't be available until 94
or so, but you probably won't be worried about an extra 4 bytes until
then.  We currently use the HexStringSplit algorithm in production, which
is 8-bytes but is human-readable.  With preliminary investigation, we
predict an 80%+ compression in our key size (currently ~80 bytes) with
HBASE-4218.

On 11/21/11 9:55 AM, "Mark" <static.void....@gmail.com> wrote:

>Damn, I was hoping my understanding was flawed.
>
>In your example I am guessing the addition of old_key suffix is to
>prevent against any possible collision. Is that correct?
>
>On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
>> Sequential writes are also an argument for pre-splitting and using hash
>> prefixing.  In other words, presplit your table into N regions instead
>>of
>> the default of 1&  transform your keys into:
>>
>> new_key = md5(old_key) + old_key
>>
>> Using this method your sequential writes under the old_key are now
>>spread
>> evenly across all regions.  There are some limitations to hash
>>prefixing,
>> such as non-sequential scans across row boundaries.  However, it's a
>> tradeoff between even distribution&  advanced query options.
>>
>> On 11/20/11 7:54 PM, "Amandeep Khurana"<ama...@gmail.com>  wrote:
>>
>>> Mark,
>>>
>>> Yes, your understanding is correct. If your keys are sequential
>>> (timestamps
>>> etc), you will always be writing to the end of the table and "older"
>>> regions will not get any writes. This is one of the arguments against
>>> using
>>> sequential keys.
>>>
>>> -ak
>>>
>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<static.void....@gmail.com>
>>>wrote:
>>>
>>>> Say we have a use case that has sequential row keys and we have rows
>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is
>>>>a
>>>> split it will split at the halfway mark so there will be two regions
>>>>as
>>>> follows:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-END]
>>>>
>>>> So now at this point all inserts will be writing to Region2 only
>>>> correct?
>>>> Now at some point Region2 will need to split and it will look like the
>>>> following before the split:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-150]
>>>>
>>>> After the split it will look like:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-100]
>>>> Region3 [150-END]
>>>>
>>>> And this pattern will continue correct? My question is when there is a
>>>> use
>>>> case that has sequential keys how would any of the older regions every
>>>> receive anymore writes? It seems like they would always be stuck at
>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>

Reply via email to