Hi All,

Can someone pls explain me layman term what rowkey and how to get the rowkey(in 
case of hash map) to load data faster into hbase. 

Regards,
Rams

On 12-Sep-2012, at 10:40 PM, lars hofhansl <[email protected]> wrote:

> Not insisting :)
> MD5 and SHA-1 would be reasonable and can be used to replace the key as you 
> say.
> 
> 
> 
> ----- Original Message -----
> From: Michael Segel <[email protected]>
> To: [email protected]; lars hofhansl <[email protected]>
> Cc: 
> Sent: Wednesday, September 12, 2012 9:49 AM
> Subject: Re: Regarding rowkey
> 
> MD5 should work, SHA-1 while theoretically may have a collision, it hasn't 
> been found. 
> Then there's SHA-2...
> 
> I don't disagree with your assertion, however... it causes the key to be 
> longer that it should have to be. 
> 
> If you insist on doing this... then take the MD5 hash, truncate it to 4 bytes 
> and prepend it to your key.  
> 
> Just saying.
> 
> -Mike
> 
> On Sep 12, 2012, at 10:25 AM, lars hofhansl <[email protected]> wrote:
> 
>> If you use a collision free hashing algorithm you're right. Otherwise you'd 
>> KVs suddenly grouped into rows that weren't part of the same row.
>> 
>> 
>> With hash prefixing you can use a fast and simple hashing algorithm, because 
>> you do not need the hash to be unique.
>> 
>> Depends again on various aspects.
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Michael Segel <[email protected]>
>> To: [email protected]; lars hofhansl <[email protected]>
>> Cc: 
>> Sent: Wednesday, September 12, 2012 5:46 AM
>> Subject: Re: Regarding rowkey
>> 
>> I wouldn't 'prefix' the hash to the key, but actually replace the key with a 
>> hash and store the unhashed key in a column. 
>> 
>> But that's a different discussion. 
>> 
>> In a nutshell, the problem is that there are a lot of potential use cases 
>> where you want to store data in a sequence dependent fashion. So you will 
>> get a continual hotspot and half full regions. 
>> 
>> Assuming that the underlying data is much larger than the key,  it may be 
>> better to hash the row key and then using coprocessors create a secondary 
>> sequential index of the initial key. 
>> 
>> The advantages are that you will have far more rows within the secondary 
>> index table before a split occurs and that there may be ways of controlling 
>> the writes to the index such that it may have less of an impact on the 
>> overall performance. (I don't know, I haven't had time to play with this 
>> idea... yet) 
>> 
>> There are other options, at least in theory... and these would also be use 
>> case specific.
>> 
>> Just remember TANSTAAFL* applies. 
>> 
>> -Mike
>> 
>> * There Aint No Such Thing As A Free Lunch - Larry Niven.
>> 
>> On Sep 11, 2012, at 10:08 PM, lars hofhansl <[email protected]> wrote:
>> 
>>> It depends. If you do not need to perform rangescans along (prefixes of) 
>>> your row keys, you can prefix the row key by a hash of the row key.
>>> That will give you a more or less random distribution of the keys and hence 
>>> not hit the same region server over and over.
>>> 
>>> You'll probably also want to presplit your table then.
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Ramasubramanian <[email protected]>
>>> To: [email protected]
>>> Cc: 
>>> Sent: Tuesday, September 11, 2012 10:39 AM
>>> Subject: Regarding rowkey
>>> 
>>> Hi,
>>> 
>>> What can be used as rowkey to improve performance while loading into hbase? 
>>> Currently I am having sequence. It takes some 11 odd minutes to load 1 
>>> million record with 147 columns.
>>> 
>>> Regards,
>>> Rams 
>>> 
>> 

Reply via email to