What's the performance penalty when scanning with row prefix filter instead of with start/end key ? Can it still work (in reasonable processing time) when the table contains billions of records ?
On Sun, Mar 20, 2011 at 10:03 PM, Pete Haidinyak <[email protected]> wrote: > I went through this discussion a month or so ago and came away with the > opinion that you can either have an efficient load with random key but then > have an inefficient 'scan' not using start and end rows, or have an > inefficient import with sequential key and then scan using start and end > rows. > > -Pete > > > > On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <[email protected]> > wrote: > > Actually discussion started from this post: >> >> >> >> http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+ >> >> Simply inserting the data in which row key <date>_<somedata> I noticed >> that >> only one node works (region to which data were writing). In case we have >> 10-15 nodes I think it is inefficient to write data to only one region. I >> want to get an effect that data will be inserted to as much as possible >> nodes simultaneously. Correct me guys , but in this case writing job >> will take less time , am I write? >> >> Oleg. >> >> On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <[email protected]> wrote: >> >> There is none - HBase uses a total order partitioner. The straight key >>> value itself determines which region a row is put into. This allows for >>> very >>> rapid scans of sequential data, among other things but does mean it is >>> easier to hotspot regions. Key design is very important. >>> >>> -chris >>> >>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote: >>> >>> > the hash function that distributes the rows between the regions. >>> > >>> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <[email protected]> wrote: >>> > >>> >> Hash? Which hash are you referring to sir? >>> >> St.Ack >>> >> >>> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <[email protected] >>> > >>> >> wrote: >>> >>> Hi, >>> >>> What is the API or configuration for changing the default hash >>> function >>> >> for >>> >>> a specific htable. >>> >>> >>> >>> thanks, >>> >>> Lior >>> >>> >>> >> >>> >>> >
