Hi, Nicola I recommend you to read HBASE-7912 design doc (it has been updated today). https://issues.apache.org/jira/browse/HBASE-7912
-Vlad On Thu, Jul 2, 2015 at 11:46 AM, Nicola Ferraro <[email protected]> wrote: > HBase has many options for performing the backup of data stored in a table. > The "export" tool is described by O'Reilly (HBase, the definitive guide), > but also here [ > > http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/comment-page-1/#comment-63294 > ] > as a way to perform hot and incremental backups on a table. > > Essentially, the procedure consists in: > - performing the backup from tome 0 to time t1 > - performing the backup from tome t1 to time t2 > - ... and so on > > Suppose we want to perform a incremental backup from t1 to t2. > Obviously the backup will start at a time t3 greater or equals to t2 and > finish at time t4. > An export-backup is a MapReduce job that essentially queries HBase in order > to retrieve data updated from time t1 to t2. > > Now, suppose that a client starts writing a particular cell right before t2 > and updates it continuously with a different value every second. > > Fresh data is written to WAL (not checked by the export tool) and memstore > only, so, every time the client writes a different cell value, the old data > is lost (assuming we are not using data versioning). > > This means that, if the clients overwrite the cell after t2 but before t3, > the backup process will not export a consistent snapshot made at time t2, > instead, the backup will contain the fresh data written after t2. This > could happen also with data written by the client after t3 and before t4 > (i.e. when the backup is in progress). > > In order to make the incremental (consistent) backup work, I see two > options: > - Enable (infinite) version history on every data written to HBase (to > avoid overriding in memstore) > - Disable compaction temporarily, force memstore flush (eg. with a > "snapshot" command), perform the backup with t2 being the snapshot time, > then re-enable compaction. > > I don't know if the second option is feasible as I did not find a way to > disable compaction temporarily. > > Is there any other, reliable, feasible option to execute hot + > consistent + incremental backups with HBase? > > Nicola >
