Hi Lars, Thanks for the detailed tip - we will go down that path. Looking at the javadoc for InternalScanner.next() - it says grab the next row's values - is this rows in the hbase sense or are these rows in the HFile - I suspect it is the latter ?
Thanks ! On Mon, Dec 10, 2012 at 11:19 PM, lars hofhansl <[email protected]> wrote: > Filters do not work for compactions. We only support them for user scans. > (some of them might incidentally work, but that is entirely untested and > unsupported) > > You best bet is to use the preCompact hook and return a wrapper scanner > like so: > > public InternalScanner > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > Store store, final InternalScanner scanner) { > return new InternalScanner() { > public boolean next(List<KeyValue> results) throws IOException { > return next(results, -1); > } > public boolean next(List<KeyValue> results, String metric) > throws IOException { > return next(results, -1, metric); > } > public boolean next(List<KeyValue> results, int limit) > throws IOException{ > return next(results, limit, null); > } > public boolean next(List<KeyValue> results, int limit, String > metric) > throws IOException { > > // call next on the passed scanner > // do your filtering here > } > > public void close() throws IOException { > scanner.close(); > } > }; > } > > -- Lars > > > > ________________________________ > From: Varun Sharma <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Monday, December 10, 2012 11:04 PM > Subject: Re: Filtering/Collection columns during Major Compaction > > Hi Lars, > > In my case, I just want to use ColumnPaginationFilter() rather than > implementing my own logic for filter. Is there an easy way to apply this > filter on top of an existing scanner ? Do I do something like > > RegionScannerImpl scanner = new RegionScannerImpl(scan_with_my_filter, > original_compaction_scanner) > > Thanks > Varun > > On Mon, Dec 10, 2012 at 9:09 PM, lars hofhansl <[email protected]> > wrote: > > > In your case you probably just want to filter on top of the provided > > scanner with preCompact (rather than actually replacing the scanner, > which > > preCompactScannerOpen does). > > > > (And sorry I only saw this reply after I sent my own reply to your > initial > > question.) > > > > > > > > ________________________________ > > From: Varun Sharma <[email protected]> > > To: [email protected] > > Sent: Monday, December 10, 2012 7:29 AM > > Subject: Re: Filtering/Collection columns during Major Compaction > > > > Okay - I looked more thoroughly again - I should be able to extract these > > from the region observer. > > > > Thanks ! > > > > On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[email protected]> > wrote: > > > > > Thanks ! This is exactly what I need. I am looking at the code in > > > compactStore() under Store.java but I am trying to understand why, for > > the > > > real compaction - smallestReadPoint needs to be passed - I thought the > > read > > > point was a memstore only thing. Also the preCompactScannerOpen does > not > > > have a way of passing this value. > > > > > > Varun > > > > > > > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > > > [email protected]> wrote: > > > > > >> Hi Varun > > >> > > >> If you are using 0.94 version you have a coprocessor that is getting > > >> invoked before and after Compaction selection. > > >> preCompactScannerOpen() helps you to create your own scanner which > > >> actually > > >> does the next() operation. > > >> Now if you can wrap your own scanner and implement your next() it will > > >> help > > >> you to play with the kvs that you need. So basically you can say what > > >> cols > > >> to include and what to exclude. > > >> Does this help you Varun? > > >> > > >> Regards > > >> Ram > > >> > > >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[email protected]> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > My understanding of major compaction is that it rewrites one store > > file > > >> and > > >> > does a merge of the memstore, store files on disk and cleans out > > delete > > >> > tombstones and puts prior to them and cleans out excess versions. We > > >> want > > >> > to limit the number of columns per row in hbase. Also, we want to > > limit > > >> > them in lexicographically sorted order - which means we take the > top, > > >> say > > >> > 100 smallest columns (in lexicographical sense) and only keep them > > while > > >> > discard the rest. > > >> > > > >> > One way to do this would be to clean out columns in a daily > mapreduce > > >> job. > > >> > Or another way is to clean them out during the major compaction > which > > >> can > > >> > be run daily too. I see, from the code that a major compaction > > >> essentially > > >> > invokes a Scan over the region - so if the Scan is invoked with the > > >> > appropriate filter (say ColumnCountGetFilter) - would that do the > > trick > > >> ? > > >> > > > >> > Thanks > > >> > Varun > > >> > > > >> > > > > > > > > >
