James, thanks for the input. Not too familiar with Phoenix although it looks like a great contrib. Unfortunately our main client is ruby using the thrift api. Using the thrift api also makes parallel scans tough, if not impossible.
On Sat, May 17, 2014 at 9:31 PM, James Taylor <[email protected]> wrote: > No, there's nothing wrong with your thinking. That's exactly what Phoenix > does - use the modulo of the hash of the key. It's important that you can > calculate the prefix byte so that you can still do fast point lookups. > > Using a modulo that's bigger than the number of region servers can make > sense as well (up to the overall number of cores in your cluster). You > can't change the modulo without rewriting the data, so factoring in future > growth makes sense. > > Thanks, > James > > > On Sat, May 17, 2014 at 8:50 PM, Software Dev > <[email protected]>wrote: > >> Well kept reading on this subject and realized my second question may >> not be appropriate since this prefix salting pattern assumes that the >> prefix is random. I thought it was actually based off a hash that >> could be predetermined so you could alwasy, if needed, get to the >> exact row key with one get. Would there be something wrong with doing >> this.. ie, using a modulo of the hash of the key? >> >> On Sat, May 17, 2014 at 8:28 PM, Software Dev <[email protected]> >> wrote: >> > I recently came across the pattern of adding a salting prefix to the >> > row keys to prevent hotspotting. Still trying to wrap my head around >> > it and I have a few questions. >> > >> > - Is there ever a reason to salt to more buckets than there are region >> > servers? The only reason why I think that may be beneficial is to >> > anticipate future growth??? >> > >> > - Is it beneficial to always hash against a known number of buckets >> > (ie never change the size) that way for any individual row key you can >> > always determine the prefix? >> > >> > - Are there any good use cases of this pattern out in the wild? >> > >> > Thanks >>
