Re: Slow scanning for PrefixFilter on EncodedBlocks

J Mohamed Zahoor Thu, 18 Oct 2012 00:46:24 -0700

+1 for making PrefixFIlter seek instead of using a startRow explicitly.

./zahoor


On Thu, Oct 18, 2012 at 4:05 AM, lars hofhansl <[email protected]> wrote:

> Oh yeah, I meant that one should always set the startrow as a matter of
> practice - if possible - and never rely on the filter alone.
>
>
>
> ________________________________
>  From: anil gupta <[email protected]>
> To: [email protected]; lars hofhansl <[email protected]>
> Sent: Wednesday, October 17, 2012 12:25 PM
> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
>
>
> Hi Lars,
>
> There is a specific use case for this:
>
> Table: Suppose i have a rowkey:<customer_id><event_timestamp><uid>
>
> Use case: I would like to get all the events of customer_id=123.
> Case 1: If i only use startRow=123 then i will get events of  other
> customers having customers_id > 123 since the scanner will be keep on
> fetching rows until the end of table.
> Case 2: If i use prefixFilter=123 and startRow=123 then i will get the
> correct result.
>
> IMHO, adding the feature of smartly adding the startRow in PrefixFilter
> wont hurt any existing functionality. Use of StartRow and PrefixFilter will
> still be different.
>
> Thanks,
> Anil Gupta
>
>
>
> On Wed, Oct 17, 2012 at 1:11 PM, lars hofhansl <[email protected]>
> wrote:
>
> That is a good point. There is no reason why prefix filter cannot issue a
> seek to the first KV for that prefix.
> >Although it lead to a practice where people would the prefix filter when
> they in fact should just set the start row.
> >
> >
> >
> >
> >
> >----- Original Message -----
> >From: anil gupta <[email protected]>
> >To: [email protected]
> >Cc:
> >Sent: Wednesday, October 17, 2012 9:41 AM
> >Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >
> >Hi Zahoor,
> >
> >I heavily use prefix filter. Every time i have to explicitly define the
> >startRow. So, that's the current behavior. However, initially this
> behavior
> >was confusing to me also.
> >I think that when a Prefix filter is defined then internally the
> >startRow=prefix can be set. User defined StartRow takes precedence over
> the
> >prefixFilter startRow. If the current prefixFilter can be modified in that
> >way then it will eradicate this confusion regarding performance of prefix
> >filter.
> >
> >Thanks,
> >Anil Gupta
> >
> >On Wed, Oct 17, 2012 at 3:44 AM, J Mohamed Zahoor <[email protected]>
> wrote:
> >
> >> First i upgraded my cluster to 94.2.. even then the problem persisted..
> >> Then i moved to using startRow instead of prefix filter..
> >>
> >>
> >> ,/zahoor
> >>
> >> On Wed, Oct 17, 2012 at 2:12 PM, J Mohamed Zahoor <[email protected]>
> >> wrote:
> >>
> >> > Sorry for the delay.
> >> >
> >> > It looks like the problem is because of PrefixFilter...
> >> > I assumed that i does a seek...
> >> >
> >> > If i use startRow instead.. it works fine.. But is it the correct
> >> approach?
> >> >
> >> > ./zahoor
> >> >
> >> >
> >> > On Wed, Oct 17, 2012 at 3:38 AM, lars hofhansl <[email protected]
> >> >wrote:
> >> >
> >> >> I reopened HBASE-6577
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message -----
> >> >> From: lars hofhansl <[email protected]>
> >> >> To: "[email protected]" <[email protected]>; lars hofhansl <
> >> >> [email protected]>
> >> >> Cc:
> >> >> Sent: Tuesday, October 16, 2012 2:39 PM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> Looks like this is exactly the scenario I was trying to optimize with
> >> >> HBASE-6577. Hmm...
> >> >> ________________________________
> >> >> From: lars hofhansl <[email protected]>
> >> >> To: "[email protected]" <[email protected]>
> >> >> Sent: Tuesday, October 16, 2012 12:21 AM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> PrefixFilter does not do any seeking by itself, so I doubt this is
> >> >> related to HBASE-6757.
> >> >> Does this only happen with FAST_DIFF compression?
> >> >>
> >> >>
> >> >> If you can create an isolated test program (that sets up the scenario
> >> and
> >> >> then runs a scan with the filter such that it is very slow), I'm
> happy
> >> to
> >> >> take a look.
> >> >>
> >> >> -- Lars
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message -----
> >> >> From: J Mohamed Zahoor <[email protected]>
> >> >> To: "[email protected]" <[email protected]>
> >> >> Cc:
> >> >> Sent: Monday, October 15, 2012 10:27 AM
> >> >> Subject: Re: Slow scanning for PrefixFilter on EncodedBlocks
> >> >>
> >> >> Is this related to HBASE-6757 ?
> >> >> I use a filter list with
> >> >>   - prefix filter
> >> >>   - filter list of column filters
> >> >>
> >> >> /zahoor
> >> >>
> >> >> On Monday, October 15, 2012, J Mohamed Zahoor wrote:
> >> >>
> >> >> > Hi
> >> >> >
> >> >> > My scanner performance is very slow when using a Prefix filter on a
> >> >> > **Encoded Column** ( encoded using FAST_DIFF on both memory and
> disk).
> >> >> > I am using 94.1 hbase.
> >> >> >
> >> >> > jstack shows that much time is spent on seeking the row.
> >> >> > Even if i give a exact row key match in the prefix filter it takes
> >> about
> >> >> > two minutes to return a single row.
> >> >> > Running this multiple times also seems to be redirecting things to
> >> disk
> >> >> > (loadBlock).
> >> >> >
> >> >> >
> >> >> > at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
> >> >> > at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> >> >> > at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> >> >> > at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> >> >> > - locked <0x000000059584fab8> (a
> >> >> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> >> >> > - locked <0x000000059584fab8> (a
> >> >> > org.apache.hadoop.hbase.regionserver.StoreScanner)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
> >> >> > at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
> >> >> > - locked <0x000000059589bb30> (a
> >> >> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> >> >> >  at
> >> >> >
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
> >> >> >
> >> >> > If is set the start and end row as same row in scan ... it come in
> >> very
> >> >> > quick.
> >> >> >
> >> >> > Saw this link
> >> >> >
> >> >>
> >>
> http://search-hadoop.com/m/9f0JH1Kz24U1&subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug
> >> >> > But it looks like things are fine in 94.1.
> >> >> >
> >> >> > Any pointers on why this is slow?
> >> >> >
> >> >> >
> >> >> > Note: the row has not many columns(5 and less than a kb) and lots
> of
> >> >> > versions (1500+)
> >> >> >
> >> >> > ./zahoor
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >>
> >
> >
> >
> >--
> >Thanks & Regards,
> >Anil Gupta
> >
> >
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Slow scanning for PrefixFilter on EncodedBlocks

Reply via email to