Re: Filtering/Collection columns during Major Compaction

Varun Sharma Tue, 11 Dec 2012 16:51:52 -0800

Hi Lars,

Thanks for the detailed tip - we will go down that path. Looking at the
javadoc for InternalScanner.next() - it says grab the next row's values -
is this rows in the hbase sense or are these rows in the HFile - I suspect
it is the latter ?


Thanks !

On Mon, Dec 10, 2012 at 11:19 PM, lars hofhansl <[email protected]> wrote:

> Filters do not work for compactions. We only support them for user scans.
> (some of them might incidentally work, but that is entirely untested and
> unsupported)
>
> You best bet is to use the preCompact hook and return a wrapper scanner
> like so:
>
>     public InternalScanner
> preCompact(ObserverContext<RegionCoprocessorEnvironment> e,
>         Store store, final InternalScanner scanner) {
>       return new InternalScanner() {
>         public boolean next(List<KeyValue> results) throws IOException {
>           return next(results, -1);
>         }
>         public boolean next(List<KeyValue> results, String metric)
>             throws IOException {
>           return next(results, -1, metric);
>         }
>         public boolean next(List<KeyValue> results, int limit)
>             throws IOException{
>           return next(results, limit, null);
>         }
>         public boolean next(List<KeyValue> results, int limit, String
> metric)
>             throws IOException {
>
>             // call next on the passed scanner
>             // do your filtering here
>         }
>
>         public void close() throws IOException {
>           scanner.close();
>         }
>       };
>     }
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <[email protected]>
> To: [email protected]; lars hofhansl <[email protected]>
> Sent: Monday, December 10, 2012 11:04 PM
> Subject: Re: Filtering/Collection columns during Major Compaction
>
> Hi Lars,
>
> In my case, I just want to use ColumnPaginationFilter() rather than
> implementing my own logic for filter. Is there an easy way to apply this
> filter on top of an existing scanner ? Do I do something like
>
> RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter,
> original_compaction_scanner)
>
> Thanks
> Varun
>
> On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[email protected]>
> wrote:
>
> > In your case you probably just want to filter on top of the provided
> > scanner with preCompact (rather than actually replacing the scanner,
> which
> > preCompactScannerOpen does).
> >
> > (And sorry I only saw this reply after I sent my own reply to your
> initial
> > question.)
> >
> >
> >
> > ________________________________
> >  From: Varun Sharma <[email protected]>
> > To: [email protected]
> > Sent: Monday, December 10, 2012 7:29 AM
> > Subject: Re: Filtering/Collection columns during Major Compaction
> >
> > Okay - I looked more thoroughly again - I should be able to extract these
> > from the region observer.
> >
> > Thanks !
> >
> > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[email protected]>
> wrote:
> >
> > > Thanks ! This is exactly what I need. I am looking at the code in
> > > compactStore() under Store.java but I am trying to understand why, for
> > the
> > > real compaction - smallestReadPoint needs to be passed - I thought the
> > read
> > > point was a memstore only thing. Also the preCompactScannerOpen does
> not
> > > have a way of passing this value.
> > >
> > > Varun
> > >
> > >
> > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan <
> > > [email protected]> wrote:
> > >
> > >> Hi Varun
> > >>
> > >> If you are using 0.94 version you have a coprocessor that is getting
> > >> invoked before and after Compaction selection.
> > >> preCompactScannerOpen() helps you to create your own scanner which
> > >> actually
> > >> does the next() operation.
> > >> Now if you can wrap your own scanner and implement your next() it will
> > >> help
> > >> you to play with the kvs that you need.  So basically you can say what
> > >> cols
> > >> to include and what to exclude.
> > >> Does this help you Varun?
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[email protected]>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > My understanding of major compaction is that it rewrites one store
> > file
> > >> and
> > >> > does a merge of the memstore, store files on disk and cleans out
> > delete
> > >> > tombstones and puts prior to them and cleans out excess versions. We
> > >> want
> > >> > to limit the number of columns per row in hbase. Also, we want to
> > limit
> > >> > them in lexicographically sorted order - which means we take the
> top,
> > >> say
> > >> > 100 smallest columns (in lexicographical sense) and only keep them
> > while
> > >> > discard the rest.
> > >> >
> > >> > One way to do this would be to clean out columns in a daily
> mapreduce
> > >> job.
> > >> > Or another way is to clean them out during the major compaction
> which
> > >> can
> > >> > be run daily too. I see, from the code that a major compaction
> > >> essentially
> > >> > invokes a Scan over the region - so if the Scan is invoked with the
> > >> > appropriate filter (say ColumnCountGetFilter) - would that do the
> > trick
> > >> ?
> > >> >
> > >> > Thanks
> > >> > Varun
> > >> >
> > >>
> > >
> > >
> >
>

Re: Filtering/Collection columns during Major Compaction

Reply via email to