Good on you Peter.
St.Ack

On Wed, Feb 16, 2011 at 1:58 PM, Peter Haidinyak <[email protected]> wrote:
> Originally sent to just Stack and now sent to the list.
>
> If I assign a row key a random value the writes will be distributed and 
> populating HBase will be faster. On the other hand if my scans will bring 
> back blocks of data (vendor by date) where each block of data can have tens 
> of thousands of rows would the retrieval process be faster if the key wasn't 
> random?
>
>  Thanks
>
>  -Pete
>
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of
>> Stack
>> Sent: Wednesday, February 16, 2011 10:52 AM
>> To: [email protected]
>> Cc: Peter Haidinyak
>> Subject: Re: Row Key Question
>>
>> On Wed, Feb 16, 2011 at 10:48 AM, Peter Haidinyak <[email protected]> 
>> wrote:
>>> I'm not using the Timestamp alone, it is part of a compound key.
>>> My old key included
>>> <timestamp>|<vendor name>|<other data>
>>>
>>> My new key will include
>>> <vendor name>|<timestamp>|<other data>
>>>
>>
>> Yes.  Got that.  Was just trying to give you a bit more background to
>> highlight what the lads were saying before me.
>>
>>
>>> This is still not ideal since a couple of vendor makes up over 50% of the 
>>> logs. It would be nice to prefix the key with a server Id and force the row 
>>> to that server. With my limited knowledge I don't know how  to do that yet.
>>>
>>
>> You don't want to do that (You'll learn why when you pick up more hbasics).
>>
>> Would suggest you not worry about the distribution.  Thats the point
>> of hbase.  You don't have to worry about where the stuff is.
>>
>> St.Ack
>>
>

Reply via email to