For those who are looking for the solution to this or similar issue, this
can be useful:

Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
implements solution close to what Lars described.
Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
<[email protected]>wrote:

> Hi Lars,
> Many tks for your reply.
>
> For now, I just rely on random or hashed keys and don't need any range
> queries.
> I will have to choose a nice solution one day for ordered keys upon which I
> will range-query.
>
> I will post the results of the different data models I will try (looking
> for other threads/articles to have a better view of the options).
>
> Tks,
> - Eric
>
>
> On 16/03/2011 23:02, Lars George wrote:
>
>> Hi Eric,
>>
>> Oops, you are right, my example was not clear and actually confusing
>> the keys with sequential ones. The hash should map every Nth row key
>> to the same bucket, so that you would for example see an interleaved
>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>> big issue. And yes, this is currently all client side driven :(
>>
>> Lars
>>
>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>> <[email protected]>  wrote:
>>
>>> Hi Lars,
>>> Many tks for your explanations!
>>>
>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>> different cases (just rephrasing what you said to be sure I get it):
>>>
>>> - Keys are really random (GUID or whatever): you have the distribution
>>> for
>>> free, still can't do, and probably don't need, range-queries.
>>>
>>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>>> there are two cases:
>>> 1) sometimes, you don't need to do some range-queries and can store the
>>> key
>>> as a real hash (md5,...) to have distribution.
>>> 2) For timebased series for example, you may need to do some range
>>> queries,
>>> and adding a salt can be an answer to combine best-of-world.
>>>
>>> I understand the "salt" approach as recreating on the client side
>>> "artifical" key spaces.
>>>
>>> I was first confused reading "row 1...1000 ->  prefix h1_".
>>> To really make the distribution random, I would have seen prefix/salt
>>> attributed randomly for a key leading to for example a h1 keyspace as
>>> such:
>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>
>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>> going
>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>> In that case, if you look for keys in "hour 1", you would only need one
>>> scanner cause you know that they reside in "h1_", and you could query
>>> with
>>> scan(h1_time1, h1_time2).
>>>
>>> But at at time, as you describe, you may need to scan different buckets
>>> with
>>> different scanners and use an ordered list to contain the result.
>>> - What about performance in that case? for very large dataset, a range
>>> query
>>> will take much time. I can imagine async client at the rescue. Maybe also
>>> mapreduce jobs could help cause if will benefit from data locality.
>>> - Also, the client application must manage the salts: it's a bit like
>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>> client carry on this layer. The client will have to store (in hbase :))
>>> the
>>> mapping between key ranges and their salt prefixes. It's a bit like
>>> exporting some core? functionality to the client.
>>>
>>> Strange, I fell I missed your point :)
>>> Tks,
>>>
>>> - Eric
>>>
>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>> (should
>>> get used to, cause I just learned another scripting language running on
>>> jvm
>>> for another project...)
>>>
>>>
>>> On 16/03/2011 13:00, Lars George wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>> it from the command line or so.
>>>>
>>>> And yes, you can range scan using a prefix. We were discussing this
>>>> recently and there is this notion of design for reads, or design for
>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>> tough to find common grounds as both designs are on the far end of the
>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>> approach, which gives you distribution but still being able to scan...
>>>> but not without some client side support. One typical class of data is
>>>> timeseries based keys. As for scanning them, you need N client side
>>>> scanners. Imagine this example:
>>>>
>>>> row       1 ... 1000 ->    Prefix "h1_"
>>>> row 1001 ... 2000 ->    Prefix "h2_"
>>>> row 2001 ... 3000 ->    Prefix "h3_"
>>>> row 3001 ... 4000 ->    Prefix "h4_"
>>>> row 4001 ... 5000 ->    Prefix "h5_"
>>>> row 5001 ... 6000 ->    Prefix "h6_"
>>>> row 6001 ... 7000 ->    Prefix "h7_"
>>>>
>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>> (also sometimes called salt) are used to distribute them row keys to
>>>> region servers. To scan the entire range as one large key space you
>>>> need to create 7 scanners:
>>>>
>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>> 7. scanner: start row: "h7_", end row ""
>>>>
>>>> Now each of them gives you the first row that matches the start and
>>>> end row keys they are configure for. So you then take that first KV
>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>> hold something like:
>>>>
>>>> 1. row "1" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> (assuming that the keys are not contiguous but have gaps)
>>>>
>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>> offering. Then insert that into the list and you get
>>>>
>>>> 1. row "3" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> Notice how you always only have a list with N elements on the client
>>>> side, each representing the next value the scanners offer. Since the
>>>> list is sorted you always access item #1 and therefore the next in the
>>>> entire key space.
>>>>
>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>> then give you values from scanner 2 as the first elements in it. And
>>>> so on.
>>>>
>>>> Makes more sense?
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>> <[email protected]>    wrote:
>>>>
>>>>> Hi Lars,
>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>> I can find python scripts, but no jruby one...
>>>>>
>>>>> Aside the hash function I could reuse, are you saying that range
>>>>> queries
>>>>> are
>>>>> possible even with hashed keys (randomly distributed)?
>>>>> (If possible with the script, it will also be possible from the hbase
>>>>> java
>>>>> client).
>>>>> Even with your explanation, I can't figure out how compound keys
>>>>> (hasedkey+key) can be range-queried.
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>> leading hashes to distribute them across servers. When you want to do
>>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>> first. You can then access the rows in sorted order where the first
>>>>>> element in the list is the one with the first key to read. Once you
>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>
>>>>>> The shell is written in JRuby, so any function you can use there would
>>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>>> you what you need.
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>> <[email protected]>      wrote:
>>>>>>
>>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>>> they
>>>>>>> can not be queried based on a range...)
>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>> advice
>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>> Tks,
>>>>>>> Eric
>>>>>>>
>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>> tables.
>>>>>>>>
>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>> key)
>>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>>> 't1',
>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>> readable
>>>>>>>> key.
>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>> would
>>>>>>>> you
>>>>>>>> advice? what about SHA1?
>>>>>>>>
>>>>>>>> Tks,
>>>>>>>> - Eric
>>>>>>>>
>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>
>>>>>>>>
>>>
>

Reply via email to