Thanks ! This is exactly what I need. I am looking at the code in compactStore() under Store.java but I am trying to understand why, for the real compaction - smallestReadPoint needs to be passed - I thought the read point was a memstore only thing. Also the preCompactScannerOpen does not have a way of passing this value.
Varun On Mon, Dec 10, 2012 at 6:08 AM, ramkrishna vasudevan < [email protected]> wrote: > Hi Varun > > If you are using 0.94 version you have a coprocessor that is getting > invoked before and after Compaction selection. > preCompactScannerOpen() helps you to create your own scanner which actually > does the next() operation. > Now if you can wrap your own scanner and implement your next() it will help > you to play with the kvs that you need. So basically you can say what cols > to include and what to exclude. > Does this help you Varun? > > Regards > Ram > > On Mon, Dec 10, 2012 at 7:28 PM, Varun Sharma <[email protected]> wrote: > > > Hi, > > > > My understanding of major compaction is that it rewrites one store file > and > > does a merge of the memstore, store files on disk and cleans out delete > > tombstones and puts prior to them and cleans out excess versions. We want > > to limit the number of columns per row in hbase. Also, we want to limit > > them in lexicographically sorted order - which means we take the top, say > > 100 smallest columns (in lexicographical sense) and only keep them while > > discard the rest. > > > > One way to do this would be to clean out columns in a daily mapreduce > job. > > Or another way is to clean them out during the major compaction which can > > be run daily too. I see, from the code that a major compaction > essentially > > invokes a Scan over the region - so if the Scan is invoked with the > > appropriate filter (say ColumnCountGetFilter) - would that do the trick ? > > > > Thanks > > Varun > > >
