James, thanks for the input. Not too familiar with Phoenix although it
looks like a great contrib. Unfortunately our main client is ruby
using the thrift api. Using the thrift api also makes parallel scans
tough, if not impossible.

On Sat, May 17, 2014 at 9:31 PM, James Taylor <[email protected]> wrote:
> No, there's nothing wrong with your thinking. That's exactly what Phoenix
> does - use the modulo of the hash of the key. It's important that you can
> calculate the prefix byte so that you can still do fast point lookups.
>
> Using a modulo that's bigger than the number of region servers can make
> sense as well (up to the overall number of cores in your cluster). You
> can't change the modulo without rewriting the data, so factoring in future
> growth makes sense.
>
> Thanks,
> James
>
>
> On Sat, May 17, 2014 at 8:50 PM, Software Dev 
> <[email protected]>wrote:
>
>> Well kept reading on this subject and realized my second question may
>> not be appropriate since this prefix salting pattern assumes that the
>> prefix is random. I thought it was actually based off a hash that
>> could be predetermined so you could alwasy, if needed, get to the
>> exact row key with one get. Would there be something wrong with doing
>> this.. ie, using a modulo of the hash of the key?
>>
>> On Sat, May 17, 2014 at 8:28 PM, Software Dev <[email protected]>
>> wrote:
>> > I recently came across the pattern of adding a salting prefix to the
>> > row keys to prevent hotspotting. Still trying to wrap my head around
>> > it and I have a few questions.
>> >
>> > - Is there ever a reason to salt to more buckets than there are region
>> > servers? The only reason why I think that may be beneficial is to
>> > anticipate future growth???
>> >
>> > - Is it beneficial to always hash against a known number of buckets
>> > (ie never change the size) that way for any individual row key you can
>> > always determine the prefix?
>> >
>> > - Are there any good use cases of this pattern out in the wild?
>> >
>> > Thanks
>>

Reply via email to