Hi All, Can someone pls explain me layman term what rowkey and how to get the rowkey(in case of hash map) to load data faster into hbase.
Regards, Rams On 12-Sep-2012, at 10:40 PM, lars hofhansl <[email protected]> wrote: > Not insisting :) > MD5 and SHA-1 would be reasonable and can be used to replace the key as you > say. > > > > ----- Original Message ----- > From: Michael Segel <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Cc: > Sent: Wednesday, September 12, 2012 9:49 AM > Subject: Re: Regarding rowkey > > MD5 should work, SHA-1 while theoretically may have a collision, it hasn't > been found. > Then there's SHA-2... > > I don't disagree with your assertion, however... it causes the key to be > longer that it should have to be. > > If you insist on doing this... then take the MD5 hash, truncate it to 4 bytes > and prepend it to your key. > > Just saying. > > -Mike > > On Sep 12, 2012, at 10:25 AM, lars hofhansl <[email protected]> wrote: > >> If you use a collision free hashing algorithm you're right. Otherwise you'd >> KVs suddenly grouped into rows that weren't part of the same row. >> >> >> With hash prefixing you can use a fast and simple hashing algorithm, because >> you do not need the hash to be unique. >> >> Depends again on various aspects. >> >> >> >> ----- Original Message ----- >> From: Michael Segel <[email protected]> >> To: [email protected]; lars hofhansl <[email protected]> >> Cc: >> Sent: Wednesday, September 12, 2012 5:46 AM >> Subject: Re: Regarding rowkey >> >> I wouldn't 'prefix' the hash to the key, but actually replace the key with a >> hash and store the unhashed key in a column. >> >> But that's a different discussion. >> >> In a nutshell, the problem is that there are a lot of potential use cases >> where you want to store data in a sequence dependent fashion. So you will >> get a continual hotspot and half full regions. >> >> Assuming that the underlying data is much larger than the key, it may be >> better to hash the row key and then using coprocessors create a secondary >> sequential index of the initial key. >> >> The advantages are that you will have far more rows within the secondary >> index table before a split occurs and that there may be ways of controlling >> the writes to the index such that it may have less of an impact on the >> overall performance. (I don't know, I haven't had time to play with this >> idea... yet) >> >> There are other options, at least in theory... and these would also be use >> case specific. >> >> Just remember TANSTAAFL* applies. >> >> -Mike >> >> * There Aint No Such Thing As A Free Lunch - Larry Niven. >> >> On Sep 11, 2012, at 10:08 PM, lars hofhansl <[email protected]> wrote: >> >>> It depends. If you do not need to perform rangescans along (prefixes of) >>> your row keys, you can prefix the row key by a hash of the row key. >>> That will give you a more or less random distribution of the keys and hence >>> not hit the same region server over and over. >>> >>> You'll probably also want to presplit your table then. >>> >>> -- Lars >>> >>> >>> >>> ----- Original Message ----- >>> From: Ramasubramanian <[email protected]> >>> To: [email protected] >>> Cc: >>> Sent: Tuesday, September 11, 2012 10:39 AM >>> Subject: Regarding rowkey >>> >>> Hi, >>> >>> What can be used as rowkey to improve performance while loading into hbase? >>> Currently I am having sequence. It takes some 11 odd minutes to load 1 >>> million record with 147 columns. >>> >>> Regards, >>> Rams >>> >>
