No, there's nothing wrong with your thinking. That's exactly what Phoenix does - use the modulo of the hash of the key. It's important that you can calculate the prefix byte so that you can still do fast point lookups.
Using a modulo that's bigger than the number of region servers can make sense as well (up to the overall number of cores in your cluster). You can't change the modulo without rewriting the data, so factoring in future growth makes sense. Thanks, James On Sat, May 17, 2014 at 8:50 PM, Software Dev <[email protected]>wrote: > Well kept reading on this subject and realized my second question may > not be appropriate since this prefix salting pattern assumes that the > prefix is random. I thought it was actually based off a hash that > could be predetermined so you could alwasy, if needed, get to the > exact row key with one get. Would there be something wrong with doing > this.. ie, using a modulo of the hash of the key? > > On Sat, May 17, 2014 at 8:28 PM, Software Dev <[email protected]> > wrote: > > I recently came across the pattern of adding a salting prefix to the > > row keys to prevent hotspotting. Still trying to wrap my head around > > it and I have a few questions. > > > > - Is there ever a reason to salt to more buckets than there are region > > servers? The only reason why I think that may be beneficial is to > > anticipate future growth??? > > > > - Is it beneficial to always hash against a known number of buckets > > (ie never change the size) that way for any individual row key you can > > always determine the prefix? > > > > - Are there any good use cases of this pattern out in the wild? > > > > Thanks >
