Thank you for your response!

So I guess 'salt' is a bit of a misnomer.  What I used to do is this :

1) Say that my key value is something like '1234foobar'
2) I obtain the hash of '1234foobar'.  Let's say that's '54824923'
3) I mod the hash by my number of regions.  Let's say I have 2000 regions.
 54824923 % 2000 = 923
4) I prepend that value to my original key value, so my new key is
'923_1234foobar'

Is this the same thing you were talking about?

A couple questions :

* Why would my regions only be 1/2 full?
* Why would I only use this for sequential keys?  I would think this would
give better performance in any situation where I don't need range scans.
For example, let's say my key value is a person's last name.  That will
naturally cluster around certain letters, giving me an uneven distribution.

--Jeremy



On Sun, May 3, 2015 at 11:46 AM, Michael Segel <[email protected]>
wrote:

> Yes, don’t use a salt. Salt implies that your seed is orthogonal (read
> random) to the base table row key.
> You’re better off using a truncated hash (md5 is fastest) so that at least
> you can use a single get().
>
> Common?
>
> Only if your row key is mostly sequential.
>
> Note that even with bucketing, you will still end up with regions only 1/2
> full with the only exception being the last region.
>
> > On May 1, 2015, at 11:09 AM, jeremy p <[email protected]>
> wrote:
> >
> > Hello all,
> >
> > I've been out of the HBase world for a while, and I'm just now jumping
> back
> > in.
> >
> > As of HBase .94, it was still common to take a hash of your RowKey and
> use
> > that to "salt" the beginning of your RowKey to obtain an even
> distribution
> > among your region servers.  Is this still a common practice, or is there
> a
> > better way to do this in HBase 1.0?
> >
> > --Jeremy
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Reply via email to