Re: Filtering/Collection columns during Major Compaction

Varun Sharma Mon, 10 Dec 2012 23:05:03 -0800

Hi Lars,

In my case, I just want to use ColumnPaginationFilter() rather than
implementing my own logic for filter. Is there an easy way to apply this
filter on top of an existing scanner ? Do I do something like


RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter,
original_compaction_scanner)

Thanks
Varun

On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[email protected]> wrote:

> In your case you probably just want to filter on top of the provided
> scanner with preCompact (rather than actually replacing the scanner, which
> preCompactScannerOpen does).
>
> (And sorry I only saw this reply after I sent my own reply to your initial
> question.)
>
>
>
> ________________________________
>  From: Varun Sharma <[email protected]>
> To: [email protected]
> Sent: Monday, December 10, 2012 7:29 AM
> Subject: Re: Filtering/Collection columns during Major Compaction
>
> Okay - I looked more thoroughly again - I should be able to extract these
> from the region observer.
>
> Thanks !
>
> On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[email protected]> wrote:
>
> > Thanks ! This is exactly what I need. I am looking at the code in
> > compactStore() under Store.java but I am trying to understand why, for
> the
> > real compaction - smallestReadPoint needs to be passed - I thought the
> read
> > point was a memstore only thing. Also the preCompactScannerOpen does not
> > have a way of passing this value.
> >
> > Varun
> >
> >
> > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan <
> > [email protected]> wrote:
> >
> >> Hi Varun
> >>
> >> If you are using 0.94 version you have a coprocessor that is getting
> >> invoked before and after Compaction selection.
> >> preCompactScannerOpen() helps you to create your own scanner which
> >> actually
> >> does the next() operation.
> >> Now if you can wrap your own scanner and implement your next() it will
> >> help
> >> you to play with the kvs that you need.  So basically you can say what
> >> cols
> >> to include and what to exclude.
> >> Does this help you Varun?
> >>
> >> Regards
> >> Ram
> >>
> >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[email protected]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > My understanding of major compaction is that it rewrites one store
> file
> >> and
> >> > does a merge of the memstore, store files on disk and cleans out
> delete
> >> > tombstones and puts prior to them and cleans out excess versions. We
> >> want
> >> > to limit the number of columns per row in hbase. Also, we want to
> limit
> >> > them in lexicographically sorted order - which means we take the top,
> >> say
> >> > 100 smallest columns (in lexicographical sense) and only keep them
> while
> >> > discard the rest.
> >> >
> >> > One way to do this would be to clean out columns in a daily mapreduce
> >> job.
> >> > Or another way is to clean them out during the major compaction which
> >> can
> >> > be run daily too. I see, from the code that a major compaction
> >> essentially
> >> > invokes a Scan over the region - so if the Scan is invoked with the
> >> > appropriate filter (say ColumnCountGetFilter) - would that do the
> trick
> >> ?
> >> >
> >> > Thanks
> >> > Varun
> >> >
> >>
> >
> >
>

Re: Filtering/Collection columns during Major Compaction

Reply via email to