A list of HBase backup options

Otis Gospodnetic Thu, 10 Mar 2011 11:34:01 -0800

Hi,

I've got some data in HBase that I'd hate to lose.  Yeah, very original. :))
I know I can:
1) make a export/backup of 1 table at a time using 
org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684
2) copy 1 table at a time using 
http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/CopyTable.html


3) use distcp to copy the whole /hbase part of HDFS
4) replicate the whole cluster - http://hbase.apache.org/replication.html
5) count on HDFS replication and live without the standard backup


What I'm not sure about is the following:

1) Is any one of the above options "hot", meaning that it can be used while the 
source cluster is running and that it produces a consistent backup (a snapshot 
or checkpoint of the source cluster's data)?
I imagine only replication of the whole cluster (point 4) above) is really 
"hot"?

2) If the HBase cluster lives in EC2, what's the best thing to do with the 
backup/snapshot?  EBS may be too expensive.  Are people stuffing their HBase 
backups into S3 somehow, despite the S3 per-bucket limit of 5 GB?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/

A list of HBase backup options

Reply via email to