Hi Pete,

You're right. If you use random keys, you will never know the start /
end keys for scan. What you really want to do is to deign the key that
will distribute well for writes but also has the certain locality for
scans.

You probably have the ideal key already (ID|Date). If you don't make
entire key to be random but just the ID part, you could get a good
distribution at write time because writes for different IDs will be
distributed across the regions, and you also could get a good scan
performance when you scan between certain dates for a specific ID
because rows for the ID will be stored together in one region.

Thanks,
Tatsuya


2011/1/29 Peter Haidinyak <[email protected]>:
> I know they are always sorted but if they are how do you know which row key 
> belong to which data? Currently I use a row key of ID|Date so I always know 
> what the startrow and endrow should be. I know I'm missing something really 
> fundamental here. :-(
>
> Thanks
>
> -Pete
>
> -----Original Message-----
> From: tsuna [mailto:[email protected]]
> Sent: Friday, January 28, 2011 12:14 PM
> To: [email protected]
> Subject: Re: Row Keys
>
> On Fri, Jan 28, 2011 at 12:09 PM, Peter Haidinyak <[email protected]> 
> wrote:
>>        This is going to seem like a dumb question but it is recommended that 
>> you use a random key to spread the insert/read load among your region 
>> servers. My question is if I am using a scan with startrow and endrow  how 
>> does that work with random row keys?
>
> The keys are always sorted.  So if you generate random keys, you'll
> get your data back in a random order.
> What is recommended depends on the specific problem you're trying to
> solve.  But generally, one of the strengths of HBase is that the rows
> are sorted, so sequential scanning is efficient (thanks to data
> locality).
>
> --
> Benoit "tsuna" Sigoure
> Software Engineer @ www.StumbleUpon.com
>



-- 
河野 達也
Tatsuya Kawano (Mr.)
Tokyo, Japan

twitter: http://twitter.com/tatsuya6502

Reply via email to