Good on you Peter. St.Ack
On Wed, Feb 16, 2011 at 1:58 PM, Peter Haidinyak <[email protected]> wrote: > Originally sent to just Stack and now sent to the list. > > If I assign a row key a random value the writes will be distributed and > populating HBase will be faster. On the other hand if my scans will bring > back blocks of data (vendor by date) where each block of data can have tens > of thousands of rows would the retrieval process be faster if the key wasn't > random? > > Thanks > > -Pete > > > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of >> Stack >> Sent: Wednesday, February 16, 2011 10:52 AM >> To: [email protected] >> Cc: Peter Haidinyak >> Subject: Re: Row Key Question >> >> On Wed, Feb 16, 2011 at 10:48 AM, Peter Haidinyak <[email protected]> >> wrote: >>> I'm not using the Timestamp alone, it is part of a compound key. >>> My old key included >>> <timestamp>|<vendor name>|<other data> >>> >>> My new key will include >>> <vendor name>|<timestamp>|<other data> >>> >> >> Yes. Got that. Was just trying to give you a bit more background to >> highlight what the lads were saying before me. >> >> >>> This is still not ideal since a couple of vendor makes up over 50% of the >>> logs. It would be nice to prefix the key with a server Id and force the row >>> to that server. With my limited knowledge I don't know how to do that yet. >>> >> >> You don't want to do that (You'll learn why when you pick up more hbasics). >> >> Would suggest you not worry about the distribution. Thats the point >> of hbase. You don't have to worry about where the stuff is. >> >> St.Ack >> >
