Re: Hash keys

Alex Baranau Thu, 21 Apr 2011 08:18:21 -0700

I needed to be sure it works for pre-0.90 version *too*, because this is
what one of our clusters use (as I believe most clusters which are in use by
others).
It works for newer versions too. There's a most recent jar available for
download in project sources root which one can just download and use with
his/her own cluster.


Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Apr 21, 2011 at 6:04 PM, Eric Charles <eric.char...@u-mangate.com>wrote:

> Hi Alex,
>
> Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase"
> thread. Tks for this :) - I will need some more time to test it (quite busy
> atm).
>
> Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD
> dependency?
>
> Tks,
> - Eric
>
>
>
> On 21/04/2011 15:56, Alex Baranau wrote:
>
>> For those who are looking for the solution to this or similar issue, this
>> can be useful:
>>
>> Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
>> implements solution close to what Lars described.
>> Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
>> <eric.char...@u-mangate.com>wrote:
>>
>>  Hi Lars,
>>> Many tks for your reply.
>>>
>>> For now, I just rely on random or hashed keys and don't need any range
>>> queries.
>>> I will have to choose a nice solution one day for ordered keys upon which
>>> I
>>> will range-query.
>>>
>>> I will post the results of the different data models I will try (looking
>>> for other threads/articles to have a better view of the options).
>>>
>>> Tks,
>>> - Eric
>>>
>>>
>>> On 16/03/2011 23:02, Lars George wrote:
>>>
>>>  Hi Eric,
>>>>
>>>> Oops, you are right, my example was not clear and actually confusing
>>>> the keys with sequential ones. The hash should map every Nth row key
>>>> to the same bucket, so that you would for example see an interleaved
>>>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>>>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>>>> big issue. And yes, this is currently all client side driven :(
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>>>> <eric.char...@u-mangate.com>   wrote:
>>>>
>>>>  Hi Lars,
>>>>> Many tks for your explanations!
>>>>>
>>>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>>>> different cases (just rephrasing what you said to be sure I get it):
>>>>>
>>>>> - Keys are really random (GUID or whatever): you have the distribution
>>>>> for
>>>>> free, still can't do, and probably don't need, range-queries.
>>>>>
>>>>> - If keys are monotonically increasing (timestamp,
>>>>> autoincremented,...),
>>>>> there are two cases:
>>>>> 1) sometimes, you don't need to do some range-queries and can store the
>>>>> key
>>>>> as a real hash (md5,...) to have distribution.
>>>>> 2) For timebased series for example, you may need to do some range
>>>>> queries,
>>>>> and adding a salt can be an answer to combine best-of-world.
>>>>>
>>>>> I understand the "salt" approach as recreating on the client side
>>>>> "artifical" key spaces.
>>>>>
>>>>> I was first confused reading "row 1...1000 ->   prefix h1_".
>>>>> To really make the distribution random, I would have seen prefix/salt
>>>>> attributed randomly for a key leading to for example a h1 keyspace as
>>>>> such:
>>>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>>>
>>>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>>>> going
>>>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>>>> In that case, if you look for keys in "hour 1", you would only need one
>>>>> scanner cause you know that they reside in "h1_", and you could query
>>>>> with
>>>>> scan(h1_time1, h1_time2).
>>>>>
>>>>> But at at time, as you describe, you may need to scan different buckets
>>>>> with
>>>>> different scanners and use an ordered list to contain the result.
>>>>> - What about performance in that case? for very large dataset, a range
>>>>> query
>>>>> will take much time. I can imagine async client at the rescue. Maybe
>>>>> also
>>>>> mapreduce jobs could help cause if will benefit from data locality.
>>>>> - Also, the client application must manage the salts: it's a bit like
>>>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>>>> client carry on this layer. The client will have to store (in hbase :))
>>>>> the
>>>>> mapping between key ranges and their salt prefixes. It's a bit like
>>>>> exporting some core? functionality to the client.
>>>>>
>>>>> Strange, I fell I missed your point :)
>>>>> Tks,
>>>>>
>>>>> - Eric
>>>>>
>>>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>>>> (should
>>>>> get used to, cause I just learned another scripting language running on
>>>>> jvm
>>>>> for another project...)
>>>>>
>>>>>
>>>>> On 16/03/2011 13:00, Lars George wrote:
>>>>>
>>>>>  Hi Eric,
>>>>>>
>>>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>>>> it from the command line or so.
>>>>>>
>>>>>> And yes, you can range scan using a prefix. We were discussing this
>>>>>> recently and there is this notion of design for reads, or design for
>>>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>>>> tough to find common grounds as both designs are on the far end of the
>>>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>>>> approach, which gives you distribution but still being able to scan...
>>>>>> but not without some client side support. One typical class of data is
>>>>>> timeseries based keys. As for scanning them, you need N client side
>>>>>> scanners. Imagine this example:
>>>>>>
>>>>>> row       1 ... 1000 ->     Prefix "h1_"
>>>>>> row 1001 ... 2000 ->     Prefix "h2_"
>>>>>> row 2001 ... 3000 ->     Prefix "h3_"
>>>>>> row 3001 ... 4000 ->     Prefix "h4_"
>>>>>> row 4001 ... 5000 ->     Prefix "h5_"
>>>>>> row 5001 ... 6000 ->     Prefix "h6_"
>>>>>> row 6001 ... 7000 ->     Prefix "h7_"
>>>>>>
>>>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>>>> (also sometimes called salt) are used to distribute them row keys to
>>>>>> region servers. To scan the entire range as one large key space you
>>>>>> need to create 7 scanners:
>>>>>>
>>>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>>>> 7. scanner: start row: "h7_", end row ""
>>>>>>
>>>>>> Now each of them gives you the first row that matches the start and
>>>>>> end row keys they are configure for. So you then take that first KV
>>>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>>>> hold something like:
>>>>>>
>>>>>> 1. row "1" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> (assuming that the keys are not contiguous but have gaps)
>>>>>>
>>>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>>>> offering. Then insert that into the list and you get
>>>>>>
>>>>>> 1. row "3" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> Notice how you always only have a list with N elements on the client
>>>>>> side, each representing the next value the scanners offer. Since the
>>>>>> list is sorted you always access item #1 and therefore the next in the
>>>>>> entire key space.
>>>>>>
>>>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>>>> then give you values from scanner 2 as the first elements in it. And
>>>>>> so on.
>>>>>>
>>>>>> Makes more sense?
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>>>> <eric.char...@u-mangate.com>     wrote:
>>>>>>
>>>>>>  Hi Lars,
>>>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>>>> I can find python scripts, but no jruby one...
>>>>>>>
>>>>>>> Aside the hash function I could reuse, are you saying that range
>>>>>>> queries
>>>>>>> are
>>>>>>> possible even with hashed keys (randomly distributed)?
>>>>>>> (If possible with the script, it will also be possible from the hbase
>>>>>>> java
>>>>>>> client).
>>>>>>> Even with your explanation, I can't figure out how compound keys
>>>>>>> (hasedkey+key) can be range-queried.
>>>>>>>
>>>>>>> Tks,
>>>>>>> - Eric
>>>>>>>
>>>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>>>
>>>>>>>  Hi Eric,
>>>>>>>>
>>>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>>>> leading hashes to distribute them across servers. When you want to
>>>>>>>> do
>>>>>>>> scans you need to create N scans, where N is the number of hashes
>>>>>>>> and
>>>>>>>> then do a next() on each scanner, putting all KVs into one sorted
>>>>>>>> list
>>>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>>>> first. You can then access the rows in sorted order where the first
>>>>>>>> element in the list is the one with the first key to read. Once you
>>>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>>>
>>>>>>>> The shell is written in JRuby, so any function you can use there
>>>>>>>> would
>>>>>>>> make sense to use in the prefix, then you could compute it on the
>>>>>>>> fly.
>>>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>>>> you could write that code in Ruby and add it to you local shell
>>>>>>>> giving
>>>>>>>> you what you need.
>>>>>>>>
>>>>>>>> Lars
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>>>> <eric.char...@u-mangate.com>       wrote:
>>>>>>>>
>>>>>>>>  Oops, forget my first question about range query (if keys are
>>>>>>>>> hashed,
>>>>>>>>> they
>>>>>>>>> can not be queried based on a range...)
>>>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>>>> advice
>>>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>>>> Tks,
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>>>> tables.
>>>>>>>>>>
>>>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>>>> key)
>>>>>>>>>> use case, because the rows will be randomly distributed in
>>>>>>>>>> different
>>>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>>>> 2. Is it possible to query from hbase shell with something like
>>>>>>>>>> "get
>>>>>>>>>> 't1',
>>>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>>>> readable
>>>>>>>>>> key.
>>>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>>>> would
>>>>>>>>>> you
>>>>>>>>>> advice? what about SHA1?
>>>>>>>>>>
>>>>>>>>>> Tks,
>>>>>>>>>> - Eric
>>>>>>>>>>
>>>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>
>>

Re: Hash keys

Reply via email to