Hi Lars,
Many tks for your explanations!

About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine different cases (just rephrasing what you said to be sure I get it):

- Keys are really random (GUID or whatever): you have the distribution for free, still can't do, and probably don't need, range-queries.

- If keys are monotonically increasing (timestamp, autoincremented,...), there are two cases: 1) sometimes, you don't need to do some range-queries and can store the key as a real hash (md5,...) to have distribution. 2) For timebased series for example, you may need to do some range queries, and adding a salt can be an answer to combine best-of-world.

I understand the "salt" approach as recreating on the client side "artifical" key spaces.

I was first confused reading "row 1...1000 -> prefix h1_".
To really make the distribution random, I would have seen prefix/salt attributed randomly for a key leading to for example a h1 keyspace as such: h1_key2032, h1_key0023, h1_key1014343, ...

Maybe you meant the intermediate approach where time keys of "hour 1" going to h1 keyspace, keys of "hour 2" going to h2 keyspace,... In that case, if you look for keys in "hour 1", you would only need one scanner cause you know that they reside in "h1_", and you could query with scan(h1_time1, h1_time2).

But at at time, as you describe, you may need to scan different buckets with different scanners and use an ordered list to contain the result. - What about performance in that case? for very large dataset, a range query will take much time. I can imagine async client at the rescue. Maybe also mapreduce jobs could help cause if will benefit from data locality. - Also, the client application must manage the salts: it's a bit like reinventing a "salt" layer on top of the hbase region servers, letting client carry on this layer. The client will have to store (in hbase :)) the mapping between key ranges and their salt prefixes. It's a bit like exporting some core? functionality to the client.

Strange, I fell I missed your point :)
Tks,

- Eric

Sidenote: ...and yes, it seems I will have to learn some ruby stuff (should get used to, cause I just learned another scripting language running on jvm for another project...)


On 16/03/2011 13:00, Lars George wrote:
Hi Eric,

Socorro is Java and Python, I was just mentioning it as a possible
source of inspiration :) You can learn Ruby and implement it (I hear
it is easy... *cough*) or write that same in a small Java app and use
it from the command line or so.

And yes, you can range scan using a prefix. We were discussing this
recently and there is this notion of design for reads, or design for
writes. DFR is usually sequential keys and DFW is random keys. It is
tough to find common grounds as both designs are on the far end of the
same spectrum. Finding a middle ground is the bucketed (or salted)
approach, which gives you distribution but still being able to scan...
but not without some client side support. One typical class of data is
timeseries based keys. As for scanning them, you need N client side
scanners. Imagine this example:

row       1 ... 1000 ->  Prefix "h1_"
row 1001 ... 2000 ->  Prefix "h2_"
row 2001 ... 3000 ->  Prefix "h3_"
row 3001 ... 4000 ->  Prefix "h4_"
row 4001 ... 5000 ->  Prefix "h5_"
row 5001 ... 6000 ->  Prefix "h6_"
row 6001 ... 7000 ->  Prefix "h7_"

So you have divided the entire range into 7 buckets. The prefixes
(also sometimes called salt) are used to distribute them row keys to
region servers. To scan the entire range as one large key space you
need to create 7 scanners:

1. scanner: start row: "h1_", end row "h2_"
2. scanner: start row: "h2_", end row "h3_"
3. scanner: start row: "h3_", end row "h4_"
4. scanner: start row: "h4_", end row "h5_"
5. scanner: start row: "h5_", end row "h6_"
6. scanner: start row: "h6_", end row "h7_"
7. scanner: start row: "h7_", end row ""

Now each of them gives you the first row that matches the start and
end row keys they are configure for. So you then take that first KV
they offer and add it to a list, sorted by ky.getRow() while removing
the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
then split and drop the prefix "h1_" to get "1". The list then would
hold something like:

1. row "1" ->  kv from scanner 1
2. row "1010" ->  kv from scanner 2
3. row "2001" ->  kv from scanner 3
4. row "3033" ->  kv from scanner 4
5. row "4001" ->  kv from scanner 5
6. row "5002" ->  kv from scanner 6
7. row "6000" ->  kv from scanner 7

(assuming that the keys are not contiguous but have gaps)

You then pop element #1 and do a "scanner1.next()" to get its next KV
offering. Then insert that into the list and you get

1. row "3" ->  kv from scanner 1
2. row "1010" ->  kv from scanner 2
3. row "2001" ->  kv from scanner 3
4. row "3033" ->  kv from scanner 4
5. row "4001" ->  kv from scanner 5
6. row "5002" ->  kv from scanner 6
7. row "6000" ->  kv from scanner 7

Notice how you always only have a list with N elements on the client
side, each representing the next value the scanners offer. Since the
list is sorted you always access item #1 and therefore the next in the
entire key space.

Once scanner 1 runs out you can close and remove it, the list will
then give you values from scanner 2 as the first elements in it. And
so on.

Makes more sense?

Lars

On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
<[email protected]>  wrote:
Hi Lars,
Are you talking about http://code.google.com/p/socorro/ ?
I can find python scripts, but no jruby one...

Aside the hash function I could reuse, are you saying that range queries are
possible even with hashed keys (randomly distributed)?
(If possible with the script, it will also be possible from the hbase java
client).
Even with your explanation, I can't figure out how compound keys
(hasedkey+key) can be range-queried.

Tks,
- Eric

On 16/03/2011 11:38, Lars George wrote:
Hi Eric,

Mozilla Socorro uses an approach where they bucket ranges using
leading hashes to distribute them across servers. When you want to do
scans you need to create N scans, where N is the number of hashes and
then do a next() on each scanner, putting all KVs into one sorted list
(use the KeyComparator for example) while stripping the prefix hash
first. You can then access the rows in sorted order where the first
element in the list is the one with the first key to read. Once you
took of the first element (being the lowest KV key) you next the
underlying scanner and reinsert it into the list, reordering it. You
keep taking from the top and therefore always see the entire range,
even if the same scanner would return the next logical rows to read.

The shell is written in JRuby, so any function you can use there would
make sense to use in the prefix, then you could compute it on the fly.
This will not help with merging the bucketed key ranges, you need to
do this with the above approach in code. Though since this is JRuby
you could write that code in Ruby and add it to you local shell giving
you what you need.

Lars

On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
<[email protected]>    wrote:
Oops, forget my first question about range query (if keys are hashed,
they
can not be queried based on a range...)
Still curious to have info on hash function in shell shell (2.) and
advice
on md5/jenkins/sha1 (3.)
Tks,
Eric

On 16/03/2011 09:52, Eric Charles wrote:
Hi,

To help avoid hotspots, I'm planning to use hashed keys in some tables.

1. I wonder if this strategy is adviced for range queries (from/to key)
use case, because the rows will be randomly distributed in different
regions. Will it cause some performance loose?
2. Is it possible to query from hbase shell with something like "get
't1',
@hash('r1')", to let the shell compute the hash for you from the
readable
key.
3. There are MD5 and Jenkins classes in hbase.util package. What would
you
advice? what about SHA1?

Tks,
- Eric

PS: I searched the archive but didn't find the answers.



Reply via email to