+1 for making PrefixFIlter seek instead of using a startRow explicitly. ./zahoor
On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl <[email protected]> wrote: > Oh yeah, I meant that one should always set the startrow as a matter of > practice - if possible - and never rely on the filter alone. > > > > ________________________________ > From: anil gupta <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Wednesday, October 17, 2012 12:25 PM > Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > > Hi Lars, > > There is a specific use case for this: > > Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid> > > Use case: I would like to get all the events of customer_id=123. > Case 1: If i only use startRow=123 then i will get events of other > customers having customers_id > 123 since the scanner will be keep on > fetching rows until the end of table. > Case 2: If i use prefixFilter=123 and startRow=123 then i will get the > correct result. > > IMHO, adding the feature of smartly adding the startRow in PrefixFilter > wont hurt any existing functionality. Use of StartRow and PrefixFilter will > still be different. > > Thanks, > Anil Gupta > > > > On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[email protected]> > wrote: > > That is a good point. There is no reason why prefix filter cannot issue a > seek to the first KV for that prefix. > >Although it lead to a practice where people would the prefix filter when > they in fact should just set the start row. > > > > > > > > > > > >----- Original Message ----- > >From: anil gupta <[email protected]> > >To: [email protected] > >Cc: > >Sent: Wednesday, October 17, 2012 9:41 AM > >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > > > >Hi Zahoor, > > > >I heavily use prefix filter. Every time i have to explicitly define the > >startRow. So, that's the current behavior. However, initially this > behavior > >was confusing to me also. > >I think that when a Prefix filter is defined then internally the > >startRow=prefix can be set. User defined StartRow takes precedence over > the > >prefixFilter startRow. If the current prefixFilter can be modified in that > >way then it will eradicate this confusion regarding performance of prefix > >filter. > > > >Thanks, > >Anil Gupta > > > >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[email protected]> > wrote: > > > >> First i upgraded my cluster to 94.2.. even then the problem persisted.. > >> Then i moved to using startRow instead of prefix filter.. > >> > >> > >> ,/zahoor > >> > >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[email protected]> > >> wrote: > >> > >> > Sorry for the delay. > >> > > >> > It looks like the problem is because of PrefixFilter... > >> > I assumed that i does a seek... > >> > > >> > If i use startRow instead.. it works fine.. But is it the correct > >> approach? > >> > > >> > ./zahoor > >> > > >> > > >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[email protected] > >> >wrote: > >> > > >> >> I reopened HBASE-6577 > >> >> > >> >> > >> >> > >> >> ----- Original Message ----- > >> >> From: lars hofhansl <[email protected]> > >> >> To: "[email protected]" <[email protected]>; lars hofhansl < > >> >> [email protected]> > >> >> Cc: > >> >> Sent: Tuesday, October 16, 2012 2:39 PM > >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> >> > >> >> Looks like this is exactly the scenario I was trying to optimize with > >> >> HBASE-6577. Hmm... > >> >> ________________________________ > >> >> From: lars hofhansl <[email protected]> > >> >> To: "[email protected]" <[email protected]> > >> >> Sent: Tuesday, October 16, 2012 12:21 AM > >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> >> > >> >> PrefixFilter does not do any seeking by itself, so I doubt this is > >> >> related to HBASE-6757. > >> >> Does this only happen with FAST_DIFF compression? > >> >> > >> >> > >> >> If you can create an isolated test program (that sets up the scenario > >> and > >> >> then runs a scan with the filter such that it is very slow), I'm > happy > >> to > >> >> take a look. > >> >> > >> >> -- Lars > >> >> > >> >> > >> >> > >> >> ----- Original Message ----- > >> >> From: J Mohamed Zahoor <[email protected]> > >> >> To: "[email protected]" <[email protected]> > >> >> Cc: > >> >> Sent: Monday, October 15, 2012 10:27 AM > >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks > >> >> > >> >> Is this related to HBASE-6757 ? > >> >> I use a filter list with > >> >> - prefix filter > >> >> - filter list of column filters > >> >> > >> >> /zahoor > >> >> > >> >> On Monday, October 15, 2012, J Mohamed Zahoor wrote: > >> >> > >> >> > Hi > >> >> > > >> >> > My scanner performance is very slow when using a Prefix filter on a > >> >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and > disk). > >> >> > I am using 94.1 hbase. > >> >> > > >> >> > jstack shows that much time is spent on seeking the row. > >> >> > Even if i give a exact row key match in the prefix filter it takes > >> about > >> >> > two minutes to return a single row. > >> >> > Running this multiple times also seems to be redirecting things to > >> disk > >> >> > (loadBlock). > >> >> > > >> >> > > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > >> >> > - locked <0x000000059584fab8> (a > >> >> > org.apache.hadoop.hbase.regionserver.StoreScanner) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > >> >> > - locked <0x000000059584fab8> (a > >> >> > org.apache.hadoop.hbase.regionserver.StoreScanner) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) > >> >> > - locked <0x000000059589bb30> (a > >> >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > >> >> > at > >> >> > > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) > >> >> > > >> >> > If is set the start and end row as same row in scan ... it come in > >> very > >> >> > quick. > >> >> > > >> >> > Saw this link > >> >> > > >> >> > >> > http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug > >> >> > But it looks like things are fine in 94.1. > >> >> > > >> >> > Any pointers on why this is slow? > >> >> > > >> >> > > >> >> > Note: the row has not many columns(5 and less than a kb) and lots > of > >> >> > versions (1500+) > >> >> > > >> >> > ./zahoor > >> >> > > >> >> > > >> >> > > >> >> > >> >> > >> > > >> > > > > > > > >-- > >Thanks & Regards, > >Anil Gupta > > > > > > > -- > Thanks & Regards, > Anil Gupta >
