Hello,

As I understand, the deleted records in hbase files do not get removed
until a major compaction is performed.

I have a few questions regarding major compaction:

1.   If I set a TTL and/or a max number of versions, the records are older
than the TTL or the
      expired versions will still be in the hbase files until the major
compaction is performed?
      Is my understanding correct?

2.   If a major compaction is never performed on a table, besides the size
of the table keep
      increasing, eventually, we will have too many hbase files and the
cluster will slow down.
      Is there any other implications?

3.   Is there any guidelines about how often should we run major compaction?

4.   During major compaction, do we need to pause all read/write operations
until major
      compaction is finished?

      I realize that if using S3 as the storage, after I run major
compaction, there is inconsistencies
      between s3 metadata and s3 file system and I need to run a "emrfs
sync" to synchronize them
      after major compaction is completed. Does it mean I need to pause all
read/write operations
      during this period?

Thanks.

Antonio.

Reply via email to