If you are asking about "current" solutions, then yes you can distcp
but I would consider that a last resort solution for the reasons you
described (yes, you could end up with an inconsistent state that
requires manual fixing). Also it completely bypasses row locks.

Another choice is using the Export MR job, using the start time option
to do incremental backups. But then you have to distcp the result of
that MR. And it's not a "point in time" that you are snapshotting,
since it doesn't lock all rows (and you don't really want that hehe).

Since you are on 0.89, you can use cluster replication. This will keep
an almost up-to-date replica on another cluster. Cons are that it
requires another cluster (may be a good thing to have in any case),
and it's still experimental so you could run into issues. See

In the future there's HBASE-50 that should also be useful.


On Tue, Sep 7, 2010 at 9:27 AM, Alexey Kovyrin <ale...@kovyrin.net> wrote:
> Hi guys,
> More and more data in our company is moving from mysql tables to hbase
> and more and more worried I am about the "no backups" situation with
> that data. I've started looking for possible solutions to backup the
> data and found two major options:
> 1) distcp of /hbase directory somewhere
> 2) HBASE-1684
> So, I have a few questions for hbase "users":
> 1) How do you backup your small (up to a hundred gb) tables?
> 2) How do you backup your huge (terabytes in size) tables?
> And a question for hbase developers: what kind of problems could cause
> a distcp from a non-locked hbase table (there is no way to lock table
> writes while backing it up AFAIU)? I understand I could lose writes
> made after I begin the backup, but if my distcp takes an hour to
> complete, I imagine lots of things will happen on the filesystem
> during this period of time. Will hbase be able to recover from this
> kind of mess?
> Thanks a lot for your comments.
> --
> Alexey Kovyrin
> http://kovyrin.net/

Reply via email to