Originally sent to just Stack and now sent to the list.

If I assign a row key a random value the writes will be distributed and 
populating HBase will be faster. On the other hand if my scans will bring back 
blocks of data (vendor by date) where each block of data can have tens of 
thousands of rows would the retrieval process be faster if the key wasn't 
random?

 Thanks

 -Pete



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Stack
> Sent: Wednesday, February 16, 2011 10:52 AM
> To: [email protected]
> Cc: Peter Haidinyak
> Subject: Re: Row Key Question
>
> On Wed, Feb 16, 2011 at 10:48 AM, Peter Haidinyak <[email protected]> 
> wrote:
>> I'm not using the Timestamp alone, it is part of a compound key.
>> My old key included
>> <timestamp>|<vendor name>|<other data>
>>
>> My new key will include
>> <vendor name>|<timestamp>|<other data>
>>
>
> Yes.  Got that.  Was just trying to give you a bit more background to 
> highlight what the lads were saying before me.
>
>
>> This is still not ideal since a couple of vendor makes up over 50% of the 
>> logs. It would be nice to prefix the key with a server Id and force the row 
>> to that server. With my limited knowledge I don't know how  to do that yet.
>>
>
> You don't want to do that (You'll learn why when you pick up more hbasics).
>
> Would suggest you not worry about the distribution.  Thats the point 
> of hbase.  You don't have to worry about where the stuff is.
>
> St.Ack
>

Reply via email to