Re: Filtering/Collection columns during Major Compaction

lars hofhansl Mon, 10 Dec 2012 21:07:02 -0800

You can replace (or post filter) the scanner used for the compaction using 
coprocessors.
Take a look at RegionObserver.preCompact, which is passed a scanner that will 
iterate over all KVs that should make it into the new store file.
You can now wrap this scanner and then any filtering you'd like to do.

________________________________
 From: Varun Sharma <[email protected]>
To: [email protected] 
Sent: Monday, December 10, 2012 5:58 AM
Subject: Filtering/Collection columns during Major Compaction

Hi,

My understanding of major compaction is that it rewrites one store file and
does a merge of the memstore, store files on disk and cleans out delete
tombstones and puts prior to them and cleans out excess versions. We want
to limit the number of columns per row in hbase. Also, we want to limit
them in lexicographically sorted order - which means we take the top, say
100 smallest columns (in lexicographical sense) and only keep them while
discard the rest.

One way to do this would be to clean out columns in a daily mapreduce job.
Or another way is to clean them out during the major compaction which can
be run daily too. I see, from the code that a major compaction essentially
invokes a Scan over the region - so if the Scan is invoked with the
appropriate filter (say ColumnCountGetFilter) - would that do the trick ?

Thanks
Varun

Re: Filtering/Collection columns during Major Compaction

Reply via email to