Vladimir Rodionov created HBASE-14142:
-----------------------------------------

             Summary: HBase Backup/Restore Phase 2: Cells deduplication during 
backup
                 Key: HBASE-14142
                 URL: https://issues.apache.org/jira/browse/HBASE-14142
             Project: HBase
          Issue Type: New Feature
            Reporter: Vladimir Rodionov
            Assignee: Vladimir Rodionov


As since we do not record last backed up sequence ids (MVCC) and do not restore 
up to that sequence id - that is kind of tricky, there will be some duplicates 
of KVs in store files after first incremental restore after full backup. These 
duplicates are result of how we do full backup and first incremental backup 
after full one. During full backup we perform distributed log roll and record, 
for every RS, last WAL timestamp, then we do snapshot. The next WAL after 
recorded one will make it into a next incremental backup set, but it will 
contains some edits (puts, deletes) which have been recorded by a previous 
snapshot. During restore, we, first, restore snapshot, then we will re-play 
WALs and this operation can create some duplicates of KVs in different store 
files.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to