Yes. Watch out for last byte being max
On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia <[email protected]>wrote: > Thanks everyone, it's really helpful. I'll change my prefix filter to end > row. Is it necessary to increment the last byte? So if I have hash of > 1234555 my end key should be 1234556? > > > On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan < > [email protected]> wrote: > > > Mohith, > > > > It is always better to go with start row and end row if you are knowing > > what are they. > > Just add one byte more to the actual end row (inclusive row) and form the > > end key. This will narrow down the search. > > > > Remeber the byte comparison is the way that HBase scans. > > Regards > > Ram > > > > On Fri, Mar 29, 2013 at 11:18 AM, Li, Min <[email protected]> > wrote: > > > > > Hi, Mohit, > > > > > > Try using ENDROW. STARTROW&ENDROW is much faster than PrefixFilter. > > > > > > "+" ascii code is 43 > > > "," ascii code is 44 > > > > > > scan 'SESSIONID_TIMELINE', {LIMIT => 1,STARTROW => '++++', > > ENDROW=>'+++,'} > > > > > > Min > > > > > > -----Original Message----- > > > From: Mohit Anchlia [mailto:[email protected]] > > > Sent: Friday, March 29, 2013 1:18 AM > > > To: [email protected] > > > Subject: Re: Understanding scan behaviour > > > > > > Could the prefix filter lead to full tablescan? In other words is > > > PrefixFilter applied after fetching the rows? > > > > > > Another question I have is say I have row key abc and abd and I search > > for > > > row "abc", is it always guranteed to be the first key when returned > from > > > scanned results? If so I can alway put a condition in the client app. > > > > > > On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu <[email protected]> wrote: > > > > > > > Take a look at the following in > > > > hbase-server/src/main/ruby/shell/commands/scan.rb > > > > (trunk) > > > > > > > > hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND > > > > (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( > 123, > > > > 456))"} > > > > > > > > Cheers > > > > > > > > On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia < > [email protected] > > > > >wrote: > > > > > > > > > I see then I misunderstood the behaviour. My keys are id + > timestamp > > so > > > > > that I can do a range type search. So what I really want is to > > return a > > > > row > > > > > where id matches the prefix. Is there a way to do this without > having > > > to > > > > > scan large amounts of data? > > > > > > > > > > > > > > > > > > > > On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi Mohit, > > > > > > > > > > > > "+" ascii code is 43 > > > > > > "9" ascii code is 57. > > > > > > > > > > > > So "+9" is coming after "++". If you don't have any row with the > > > exact > > > > > > key "+++++", HBase will look for the first one after this one. > And > > in > > > > > > your case, it's > > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. > > > > > > > > > > > > JM > > > > > > > > > > > > 2013/3/28 Mohit Anchlia <[email protected]>: > > > > > > > My understanding is that the row key would start with +++++ for > > > > > instance. > > > > > > > > > > > > > > On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > >> Hi Mohit, > > > > > > >> > > > > > > >> I see nothing wrong with the results below. What would I have > > > > > expected? > > > > > > >> > > > > > > >> JM > > > > > > >> > > > > > > >> 2013/3/28 Mohit Anchlia <[email protected]>: > > > > > > >> > I am running 92.1 version and this is what happens. > > > > > > >> > > > > > > > >> > > > > > > > >> > hbase(main):003:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, > > > > STARTROW > > > > > => > > > > > > >> > 'sdw0'} > > > > > > >> > ROW > > COLUMN+CELL > > > > > > >> > s\xC1\xEAR\xDF\xEA&\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x > > > > > > >> > column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, > > > > > > >> > value=PAGE\x09\x091363056252990\x09\x09/ > > > > > > >> > 7F\xFF\xFE\xC2\xA3\x84Z\x7F > > > > > > >> > > > > > > > >> > 1 row(s) in 0.0450 seconds > > > > > > >> > hbase(main):004:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, > > > > STARTROW > > > > > => > > > > > > >> > '------'} > > > > > > >> > ROW > > COLUMN+CELL > > > > > > >> > -\xA1\xAF>r\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ > > > > > > >> > column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, > > > > > > >> > value=PAGE\x09239923973\x091363384698919\x09/ > > > > > > >> > xFF\xFE\xC2\x8F\xF0\xC1\xBF > > > > > > >> > row(s) in 0.0500 seconds > > > > > > >> > hbase(main):005:0> scan 'SESSIONID_TIMELINE', {LIMIT => 1, > > > > STARTROW > > > > > => > > > > > > >> > '++++'} > > > > > > >> > ROW > > COLUMN+CELL > > > > > > >> > +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF > > > > > > >> > column=SID_T_MTX:\x00\x00<2, timestamp=1364404155426, > > > > > > >> > value=PAGE\x09\x091364404145275\x09 \x09/ > > > > > > >> > E\xC2S-\x08\x1F > > > > > > >> > 1 row(s) in 0.0640 seconds > > > > > > >> > hbase(main):006:0> > > > > > > >> > > > > > > > >> > > > > > > > >> > On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan < > > > > > > >> > [email protected]> wrote: > > > > > > >> > > > > > > > >> >> Same question, same time :) > > > > > > >> >> > > > > > > >> >> Regards > > > > > > >> >> Ram > > > > > > >> >> > > > > > > >> >> On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan < > > > > > > >> >> [email protected]> wrote: > > > > > > >> >> > > > > > > >> >> > Could you give us some more insights on this? > > > > > > >> >> > So you mean when you set the row key as 'azzzaaa', though > > > this > > > > > row > > > > > > >> does > > > > > > >> >> > not exist, the scanner returns some other row? Or it is > > > giving > > > > > > you a > > > > > > >> row > > > > > > >> >> > that does not exist? > > > > > > >> >> > > > > > > > >> >> > Or you mean it is doing a full table scan? > > > > > > >> >> > > > > > > > >> >> > Which version of HBase and what type of filters are you > > > using? > > > > > > >> >> > Regards > > > > > > >> >> > Ram > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia < > > > > > > >> [email protected] > > > > > > >> >> >wrote: > > > > > > >> >> > > > > > > > >> >> >> I have key in the form of "hashedid + timestamp" but > when > > I > > > > run > > > > > > scan > > > > > > >> I > > > > > > >> >> get > > > > > > >> >> >> rows for almost every value. For instance if I run scan > > for > > > > > > 'azzzaaa' > > > > > > >> >> that > > > > > > >> >> >> doesn't even exist even then I get the results. > > > > > > >> >> >> > > > > > > >> >> >> Could someone help me understand what might be going on > > > here? > > > > > > >> >> >> > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
