On Wed, Jun 14, 2017 at 5:51 PM, Dave Latham <[email protected]> wrote:
> What cells, if any, are removed during minor compactions? > > Cells that > (a) are beyond the TTL? > (b) are shadowed by a delete marker? (from the files compacted) > (c) are shadowed by newer versions? (assuming numVersions configured < num > versions of the cell found) > Compacting, we use scanners reading hfiles. Core difference between major and main compaction is the scanType. If major (i.e. all files in the Store are in the compaction set), then ScanType.COMPACT_DROP_DELETES else ScanType.COMPACT_RETAIN_DELETES. Logic on what to retain/delete is what makes for a Scan determined by rules in ScanQueryMatcher (Actually, compactions use CompactionScanQueryMatcher, a subclass whose only purpose is enforcing the scanType delete policy). To answer your questions Dave: a.) Yes (A Scan does not let you see Cells that are beyond TTL so on compaction, they are not 'seen' and so not written out to the new compacted file). b.) No (See logic in CompactionScanQueryMatcher) c.) Yes St.Ack
