In your case you probably just want to filter on top of the provided scanner with preCompact (rather than actually replacing the scanner, which preCompactScannerOpen does).
(And sorry I only saw this reply after I sent my own reply to your initial question.) ________________________________ From: Varun Sharma <[email protected]> To: [email protected] Sent: Monday, December 10, 2012 7:29 AM Subject: Re: Filtering/Collection columns during Major Compaction Okay - I looked more thoroughly again - I should be able to extract these from the region observer. Thanks ! On Mon, Dec 10, 2012 at 6:59 AM, Varun Sharma <[email protected]> wrote: > Thanks ! This is exactly what I need. I am looking at the code in > compactStore() under Store.java but I am trying to understand why, for the > real compaction - smallestReadPoint needs to be passed - I thought the read > point was a memstore only thing. Also the preCompactScannerOpen does not > have a way of passing this value. > > Varun > > > On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < > [email protected]> wrote: > >> Hi Varun >> >> If you are using 0.94 version you have a coprocessor that is getting >> invoked before and after Compaction selection. >> preCompactScannerOpen() helps you to create your own scanner which >> actually >> does the next() operation. >> Now if you can wrap your own scanner and implement your next() it will >> help >> you to play with the kvs that you need. So basically you can say what >> cols >> to include and what to exclude. >> Does this help you Varun? >> >> Regards >> Ram >> >> On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[email protected]> >> wrote: >> >> > Hi, >> > >> > My understanding of major compaction is that it rewrites one store file >> and >> > does a merge of the memstore, store files on disk and cleans out delete >> > tombstones and puts prior to them and cleans out excess versions. We >> want >> > to limit the number of columns per row in hbase. Also, we want to limit >> > them in lexicographically sorted order - which means we take the top, >> say >> > 100 smallest columns (in lexicographical sense) and only keep them while >> > discard the rest. >> > >> > One way to do this would be to clean out columns in a daily mapreduce >> job. >> > Or another way is to clean them out during the major compaction which >> can >> > be run daily too. I see, from the code that a major compaction >> essentially >> > invokes a Scan over the region - so if the Scan is invoked with the >> > appropriate filter (say ColumnCountGetFilter) - would that do the trick >> ? >> > >> > Thanks >> > Varun >> > >> > >
