Thank you Vlad,
I looked at the design document and found the approach really interesting.
I have also checked some of the linked jiras and found out that there is
still much work ahead for developing and testing the solution, though you
are doing a great job.
I was wondering whether you knew any solution/workaround for hot,
consistent and incremental backup that is already applicable with version
(up to) 1.1. I know that snapshots can be taken online and are consistent,
but, in the application on which I am currently working, some tables are
expected to become huge after some time, so there is a strong need of an
incremental solution.

Thank you for your support,
Nicola



Il giorno gio 2 lug 2015 alle ore 22:00 Vladimir Rodionov <
[email protected]> ha scritto:

> Hi, Nicola
>
> I recommend you to read HBASE-7912 design doc (it has been updated today).
> https://issues.apache.org/jira/browse/HBASE-7912
>
> -Vlad
>
> On Thu, Jul 2, 2015 at 11:46 AM, Nicola Ferraro <[email protected]>
> wrote:
>
> > HBase has many options for performing the backup of data stored in a
> table.
> > The "export" tool is described by O'Reilly (HBase, the definitive guide),
> > but also here [
> >
> >
> http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/comment-page-1/#comment-63294
> > ]
> > as a way to perform hot and incremental backups on a table.
> >
> > Essentially, the procedure consists in:
> > - performing the backup from tome 0 to time t1
> > - performing the backup from tome t1 to time t2
> > - ... and so on
> >
> > Suppose we want to perform a incremental backup from t1 to t2.
> > Obviously the backup will start at a time t3 greater or equals to t2 and
> > finish at time t4.
> > An export-backup is a MapReduce job that essentially queries HBase in
> order
> > to retrieve data updated from time t1 to t2.
> >
> > Now, suppose that a client starts writing a particular cell right before
> t2
> > and updates it continuously with a different value every second.
> >
> > Fresh data is written to WAL (not checked by the export tool) and
> memstore
> > only, so, every time the client writes a different cell value, the old
> data
> > is lost (assuming we are not using data versioning).
> >
> > This means that, if the clients overwrite the cell after t2 but before
> t3,
> > the backup process will not export a consistent snapshot made at time t2,
> > instead, the backup will contain the fresh data written after t2. This
> > could happen also with data written by the client after t3 and before t4
> > (i.e. when the backup is in progress).
> >
> > In order to make the incremental (consistent) backup work, I see two
> > options:
> > - Enable (infinite) version history on every data written to HBase (to
> > avoid overriding in memstore)
> > - Disable compaction temporarily, force memstore flush (eg. with a
> > "snapshot" command), perform the backup with t2 being the snapshot time,
> > then re-enable compaction.
> >
> > I don't know if the second option is feasible as I did not find a way to
> > disable compaction temporarily.
> >
> > Is there any other, reliable, feasible option to execute hot +
> > consistent + incremental backups with HBase?
> >
> > Nicola
> >
>

Reply via email to