For those who are looking for the solution to this or similar issue, this can be useful:
Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which implements solution close to what Lars described. Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles <[email protected]>wrote: > Hi Lars, > Many tks for your reply. > > For now, I just rely on random or hashed keys and don't need any range > queries. > I will have to choose a nice solution one day for ordered keys upon which I > will range-query. > > I will post the results of the different data models I will try (looking > for other threads/articles to have a better view of the options). > > Tks, > - Eric > > > On 16/03/2011 23:02, Lars George wrote: > >> Hi Eric, >> >> Oops, you are right, my example was not clear and actually confusing >> the keys with sequential ones. The hash should map every Nth row key >> to the same bucket, so that you would for example see an interleaved >> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while >> region 2 holds 2, 9, 16,... and so on. I do not think performance is a >> big issue. And yes, this is currently all client side driven :( >> >> Lars >> >> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles >> <[email protected]> wrote: >> >>> Hi Lars, >>> Many tks for your explanations! >>> >>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine >>> different cases (just rephrasing what you said to be sure I get it): >>> >>> - Keys are really random (GUID or whatever): you have the distribution >>> for >>> free, still can't do, and probably don't need, range-queries. >>> >>> - If keys are monotonically increasing (timestamp, autoincremented,...), >>> there are two cases: >>> 1) sometimes, you don't need to do some range-queries and can store the >>> key >>> as a real hash (md5,...) to have distribution. >>> 2) For timebased series for example, you may need to do some range >>> queries, >>> and adding a salt can be an answer to combine best-of-world. >>> >>> I understand the "salt" approach as recreating on the client side >>> "artifical" key spaces. >>> >>> I was first confused reading "row 1...1000 -> prefix h1_". >>> To really make the distribution random, I would have seen prefix/salt >>> attributed randomly for a key leading to for example a h1 keyspace as >>> such: >>> h1_key2032, h1_key0023, h1_key1014343, ... >>> >>> Maybe you meant the intermediate approach where time keys of "hour 1" >>> going >>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,... >>> In that case, if you look for keys in "hour 1", you would only need one >>> scanner cause you know that they reside in "h1_", and you could query >>> with >>> scan(h1_time1, h1_time2). >>> >>> But at at time, as you describe, you may need to scan different buckets >>> with >>> different scanners and use an ordered list to contain the result. >>> - What about performance in that case? for very large dataset, a range >>> query >>> will take much time. I can imagine async client at the rescue. Maybe also >>> mapreduce jobs could help cause if will benefit from data locality. >>> - Also, the client application must manage the salts: it's a bit like >>> reinventing a "salt" layer on top of the hbase region servers, letting >>> client carry on this layer. The client will have to store (in hbase :)) >>> the >>> mapping between key ranges and their salt prefixes. It's a bit like >>> exporting some core? functionality to the client. >>> >>> Strange, I fell I missed your point :) >>> Tks, >>> >>> - Eric >>> >>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff >>> (should >>> get used to, cause I just learned another scripting language running on >>> jvm >>> for another project...) >>> >>> >>> On 16/03/2011 13:00, Lars George wrote: >>> >>>> Hi Eric, >>>> >>>> Socorro is Java and Python, I was just mentioning it as a possible >>>> source of inspiration :) You can learn Ruby and implement it (I hear >>>> it is easy... *cough*) or write that same in a small Java app and use >>>> it from the command line or so. >>>> >>>> And yes, you can range scan using a prefix. We were discussing this >>>> recently and there is this notion of design for reads, or design for >>>> writes. DFR is usually sequential keys and DFW is random keys. It is >>>> tough to find common grounds as both designs are on the far end of the >>>> same spectrum. Finding a middle ground is the bucketed (or salted) >>>> approach, which gives you distribution but still being able to scan... >>>> but not without some client side support. One typical class of data is >>>> timeseries based keys. As for scanning them, you need N client side >>>> scanners. Imagine this example: >>>> >>>> row 1 ... 1000 -> Prefix "h1_" >>>> row 1001 ... 2000 -> Prefix "h2_" >>>> row 2001 ... 3000 -> Prefix "h3_" >>>> row 3001 ... 4000 -> Prefix "h4_" >>>> row 4001 ... 5000 -> Prefix "h5_" >>>> row 5001 ... 6000 -> Prefix "h6_" >>>> row 6001 ... 7000 -> Prefix "h7_" >>>> >>>> So you have divided the entire range into 7 buckets. The prefixes >>>> (also sometimes called salt) are used to distribute them row keys to >>>> region servers. To scan the entire range as one large key space you >>>> need to create 7 scanners: >>>> >>>> 1. scanner: start row: "h1_", end row "h2_" >>>> 2. scanner: start row: "h2_", end row "h3_" >>>> 3. scanner: start row: "h3_", end row "h4_" >>>> 4. scanner: start row: "h4_", end row "h5_" >>>> 5. scanner: start row: "h5_", end row "h6_" >>>> 6. scanner: start row: "h6_", end row "h7_" >>>> 7. scanner: start row: "h7_", end row "" >>>> >>>> Now each of them gives you the first row that matches the start and >>>> end row keys they are configure for. So you then take that first KV >>>> they offer and add it to a list, sorted by ky.getRow() while removing >>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer, >>>> then split and drop the prefix "h1_" to get "1". The list then would >>>> hold something like: >>>> >>>> 1. row "1" -> kv from scanner 1 >>>> 2. row "1010" -> kv from scanner 2 >>>> 3. row "2001" -> kv from scanner 3 >>>> 4. row "3033" -> kv from scanner 4 >>>> 5. row "4001" -> kv from scanner 5 >>>> 6. row "5002" -> kv from scanner 6 >>>> 7. row "6000" -> kv from scanner 7 >>>> >>>> (assuming that the keys are not contiguous but have gaps) >>>> >>>> You then pop element #1 and do a "scanner1.next()" to get its next KV >>>> offering. Then insert that into the list and you get >>>> >>>> 1. row "3" -> kv from scanner 1 >>>> 2. row "1010" -> kv from scanner 2 >>>> 3. row "2001" -> kv from scanner 3 >>>> 4. row "3033" -> kv from scanner 4 >>>> 5. row "4001" -> kv from scanner 5 >>>> 6. row "5002" -> kv from scanner 6 >>>> 7. row "6000" -> kv from scanner 7 >>>> >>>> Notice how you always only have a list with N elements on the client >>>> side, each representing the next value the scanners offer. Since the >>>> list is sorted you always access item #1 and therefore the next in the >>>> entire key space. >>>> >>>> Once scanner 1 runs out you can close and remove it, the list will >>>> then give you values from scanner 2 as the first elements in it. And >>>> so on. >>>> >>>> Makes more sense? >>>> >>>> Lars >>>> >>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles >>>> <[email protected]> wrote: >>>> >>>>> Hi Lars, >>>>> Are you talking about http://code.google.com/p/socorro/ ? >>>>> I can find python scripts, but no jruby one... >>>>> >>>>> Aside the hash function I could reuse, are you saying that range >>>>> queries >>>>> are >>>>> possible even with hashed keys (randomly distributed)? >>>>> (If possible with the script, it will also be possible from the hbase >>>>> java >>>>> client). >>>>> Even with your explanation, I can't figure out how compound keys >>>>> (hasedkey+key) can be range-queried. >>>>> >>>>> Tks, >>>>> - Eric >>>>> >>>>> On 16/03/2011 11:38, Lars George wrote: >>>>> >>>>>> Hi Eric, >>>>>> >>>>>> Mozilla Socorro uses an approach where they bucket ranges using >>>>>> leading hashes to distribute them across servers. When you want to do >>>>>> scans you need to create N scans, where N is the number of hashes and >>>>>> then do a next() on each scanner, putting all KVs into one sorted list >>>>>> (use the KeyComparator for example) while stripping the prefix hash >>>>>> first. You can then access the rows in sorted order where the first >>>>>> element in the list is the one with the first key to read. Once you >>>>>> took of the first element (being the lowest KV key) you next the >>>>>> underlying scanner and reinsert it into the list, reordering it. You >>>>>> keep taking from the top and therefore always see the entire range, >>>>>> even if the same scanner would return the next logical rows to read. >>>>>> >>>>>> The shell is written in JRuby, so any function you can use there would >>>>>> make sense to use in the prefix, then you could compute it on the fly. >>>>>> This will not help with merging the bucketed key ranges, you need to >>>>>> do this with the above approach in code. Though since this is JRuby >>>>>> you could write that code in Ruby and add it to you local shell giving >>>>>> you what you need. >>>>>> >>>>>> Lars >>>>>> >>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> Oops, forget my first question about range query (if keys are hashed, >>>>>>> they >>>>>>> can not be queried based on a range...) >>>>>>> Still curious to have info on hash function in shell shell (2.) and >>>>>>> advice >>>>>>> on md5/jenkins/sha1 (3.) >>>>>>> Tks, >>>>>>> Eric >>>>>>> >>>>>>> On 16/03/2011 09:52, Eric Charles wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some >>>>>>>> tables. >>>>>>>> >>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to >>>>>>>> key) >>>>>>>> use case, because the rows will be randomly distributed in different >>>>>>>> regions. Will it cause some performance loose? >>>>>>>> 2. Is it possible to query from hbase shell with something like "get >>>>>>>> 't1', >>>>>>>> @hash('r1')", to let the shell compute the hash for you from the >>>>>>>> readable >>>>>>>> key. >>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What >>>>>>>> would >>>>>>>> you >>>>>>>> advice? what about SHA1? >>>>>>>> >>>>>>>> Tks, >>>>>>>> - Eric >>>>>>>> >>>>>>>> PS: I searched the archive but didn't find the answers. >>>>>>>> >>>>>>>> >>> >
