I wouldn't 'prefix' the hash to the key, but actually replace the key with a 
hash and store the unhashed key in a column. 

But that's a different discussion. 

In a nutshell, the problem is that there are a lot of potential use cases where 
you want to store data in a sequence dependent fashion. So you will get a 
continual hotspot and half full regions. 

Assuming that the underlying data is much larger than the key,  it may be 
better to hash the row key and then using coprocessors create a secondary 
sequential index of the initial key. 

The advantages are that you will have far more rows within the secondary index 
table before a split occurs and that there may be ways of controlling the 
writes to the index such that it may have less of an impact on the overall 
performance. (I don't know, I haven't had time to play with this idea... yet) 

There are other options, at least in theory... and these would also be use case 
specific.

Just remember TANSTAAFL* applies. 

-Mike

* There Aint No Such Thing As A Free Lunch - Larry Niven.

On Sep 11, 2012, at 10:08 PM, lars hofhansl <[email protected]> wrote:

> It depends. If you do not need to perform rangescans along (prefixes of) your 
> row keys, you can prefix the row key by a hash of the row key.
> That will give you a more or less random distribution of the keys and hence 
> not hit the same region server over and over.
> 
> You'll probably also want to presplit your table then.
> 
> -- Lars
> 
> 
> 
> ----- Original Message -----
> From: Ramasubramanian <[email protected]>
> To: [email protected]
> Cc: 
> Sent: Tuesday, September 11, 2012 10:39 AM
> Subject: Regarding rowkey
> 
> Hi,
> 
> What can be used as rowkey to improve performance while loading into hbase? 
> Currently I am having sequence. It takes some 11 odd minutes to load 1 
> million record with 147 columns.
> 
> Regards,
> Rams 
> 

Reply via email to