Re: Filtering/Collection columns during Major Compaction

Varun Sharma Mon, 10 Dec 2012 21:10:13 -0800

So, I actually wrote something that uses the preCompactScannerOpen and
initialize a StoreScanner in exactly the same way as we do for a major
compaction. Except that I add the filter I need to this scanner
(ColumnPaginationFilter) - I guess that should accomplish the same thing.


On Mon, Dec 10, 2012 at 9:06 PM, lars hofhansl <[email protected]> wrote:

> You can replace (or post filter) the scanner used for the compaction using
> coprocessors.
> Take a look at RegionObserver.preCompact, which is passed a scanner that
> will iterate over all KVs that should make it into the new store file.
> You can now wrap this scanner and then any filtering you'd like to do.
>
>
>
> ________________________________
>  From: Varun Sharma <[email protected]>
> To: [email protected]
> Sent: Monday, December 10, 2012 5:58 AM
> Subject: Filtering/Collection columns during Major Compaction
>
> Hi,
>
> My understanding of major compaction is that it rewrites one store file and
> does a merge of the memstore, store files on disk and cleans out delete
> tombstones and puts prior to them and cleans out excess versions. We want
> to limit the number of columns per row in hbase. Also, we want to limit
> them in lexicographically sorted order - which means we take the top, say
> 100 smallest columns (in lexicographical sense) and only keep them while
> discard the rest.
>
> One way to do this would be to clean out columns in a daily mapreduce job.
> Or another way is to clean them out during the major compaction which can
> be run daily too. I see, from the code that a major compaction essentially
> invokes a Scan over the region - so if the Scan is invoked with the
> appropriate filter (say ColumnCountGetFilter) - would that do the trick ?
>
> Thanks
> Varun
>

Re: Filtering/Collection columns during Major Compaction

Reply via email to