Thank you Vlad, I looked at the design document and found the approach really interesting. I have also checked some of the linked jiras and found out that there is still much work ahead for developing and testing the solution, though you are doing a great job. I was wondering whether you knew any solution/workaround for hot, consistent and incremental backup that is already applicable with version (up to) 1.1. I know that snapshots can be taken online and are consistent, but, in the application on which I am currently working, some tables are expected to become huge after some time, so there is a strong need of an incremental solution.
Thank you for your support, Nicola Il giorno gio 2 lug 2015 alle ore 22:00 Vladimir Rodionov < [email protected]> ha scritto: > Hi, Nicola > > I recommend you to read HBASE-7912 design doc (it has been updated today). > https://issues.apache.org/jira/browse/HBASE-7912 > > -Vlad > > On Thu, Jul 2, 2015 at 11:46 AM, Nicola Ferraro <[email protected]> > wrote: > > > HBase has many options for performing the backup of data stored in a > table. > > The "export" tool is described by O'Reilly (HBase, the definitive guide), > > but also here [ > > > > > http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/comment-page-1/#comment-63294 > > ] > > as a way to perform hot and incremental backups on a table. > > > > Essentially, the procedure consists in: > > - performing the backup from tome 0 to time t1 > > - performing the backup from tome t1 to time t2 > > - ... and so on > > > > Suppose we want to perform a incremental backup from t1 to t2. > > Obviously the backup will start at a time t3 greater or equals to t2 and > > finish at time t4. > > An export-backup is a MapReduce job that essentially queries HBase in > order > > to retrieve data updated from time t1 to t2. > > > > Now, suppose that a client starts writing a particular cell right before > t2 > > and updates it continuously with a different value every second. > > > > Fresh data is written to WAL (not checked by the export tool) and > memstore > > only, so, every time the client writes a different cell value, the old > data > > is lost (assuming we are not using data versioning). > > > > This means that, if the clients overwrite the cell after t2 but before > t3, > > the backup process will not export a consistent snapshot made at time t2, > > instead, the backup will contain the fresh data written after t2. This > > could happen also with data written by the client after t3 and before t4 > > (i.e. when the backup is in progress). > > > > In order to make the incremental (consistent) backup work, I see two > > options: > > - Enable (infinite) version history on every data written to HBase (to > > avoid overriding in memstore) > > - Disable compaction temporarily, force memstore flush (eg. with a > > "snapshot" command), perform the backup with t2 being the snapshot time, > > then re-enable compaction. > > > > I don't know if the second option is feasible as I did not find a way to > > disable compaction temporarily. > > > > Is there any other, reliable, feasible option to execute hot + > > consistent + incremental backups with HBase? > > > > Nicola > > >
