Re: Hash keys

Eric Charles Wed, 16 Mar 2011 04:09:39 -0700

Hi Lars,
Are you talking about http://code.google.com/p/socorro/ ?
I can find python scripts, but no jruby one...

Aside the hash function I could reuse, are you saying that range queriesare possible even with hashed keys (randomly distributed)?(If possible with the script, it will also be possible from the hbasejava client).Even with your explanation, I can't figure out how compound keys(hasedkey+key) can be range-queried.


Tks,
- Eric

On 16/03/2011 11:38, Lars George wrote:

Hi Eric,

Mozilla Socorro uses an approach where they bucket ranges using
leading hashes to distribute them across servers. When you want to do
scans you need to create N scans, where N is the number of hashes and
then do a next() on each scanner, putting all KVs into one sorted list
(use the KeyComparator for example) while stripping the prefix hash
first. You can then access the rows in sorted order where the first
element in the list is the one with the first key to read. Once you
took of the first element (being the lowest KV key) you next the
underlying scanner and reinsert it into the list, reordering it. You
keep taking from the top and therefore always see the entire range,
even if the same scanner would return the next logical rows to read.

The shell is written in JRuby, so any function you can use there would
make sense to use in the prefix, then you could compute it on the fly.
This will not help with merging the bucketed key ranges, you need to
do this with the above approach in code. Though since this is JRuby
you could write that code in Ruby and add it to you local shell giving
you what you need.

Lars

On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
<[email protected]>  wrote:

Oops, forget my first question about range query (if keys are hashed, they
can not be queried based on a range...)
Still curious to have info on hash function in shell shell (2.) and advice
on md5/jenkins/sha1 (3.)
Tks,
Eric

On 16/03/2011 09:52, Eric Charles wrote:

Hi,

To help avoid hotspots, I'm planning to use hashed keys in some tables.

1. I wonder if this strategy is adviced for range queries (from/to key)
use case, because the rows will be randomly distributed in different
regions. Will it cause some performance loose?
2. Is it possible to query from hbase shell with something like "get 't1',
@hash('r1')", to let the shell compute the hash for you from the readable
key.
3. There are MD5 and Jenkins classes in hbase.util package. What would you
advice? what about SHA1?

Tks,
- Eric

PS: I searched the archive but didn't find the answers.

Re: Hash keys

Reply via email to