So, I actually wrote something that uses the preCompactScannerOpen and initialize a StoreScanner in exactly the same way as we do for a major compaction. Except that I add the filter I need to this scanner (ColumnPaginationFilter) - I guess that should accomplish the same thing.
On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <[email protected]> wrote: > You can replace (or post filter) the scanner used for the compaction using > coprocessors. > Take a look at RegionObserver.preCompact, which is passed a scanner that > will iterate over all KVs that should make it into the new store file. > You can now wrap this scanner and then any filtering you'd like to do. > > > > ________________________________ > From: Varun Sharma <[email protected]> > To: [email protected] > Sent: Monday, December 10, 2012 5:58 AM > Subject: Filtering/Collection columns during Major Compaction > > Hi, > > My understanding of major compaction is that it rewrites one store file and > does a merge of the memstore, store files on disk and cleans out delete > tombstones and puts prior to them and cleans out excess versions. We want > to limit the number of columns per row in hbase. Also, we want to limit > them in lexicographically sorted order - which means we take the top, say > 100 smallest columns (in lexicographical sense) and only keep them while > discard the rest. > > One way to do this would be to clean out columns in a daily mapreduce job. > Or another way is to clean them out during the major compaction which can > be run daily too. I see, from the code that a major compaction essentially > invokes a Scan over the region - so if the Scan is invoked with the > appropriate filter (say ColumnCountGetFilter) - would that do the trick ? > > Thanks > Varun >
