Hi, I've got some data in HBase that I'd hate to lose. Yeah, very original. :)) I know I can: 1) make a export/backup of 1 table at a time using org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684 2) copy 1 table at a time using http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/CopyTable.html
3) use distcp to copy the whole /hbase part of HDFS 4) replicate the whole cluster - http://hbase.apache.org/replication.html 5) count on HDFS replication and live without the standard backup What I'm not sure about is the following: 1) Is any one of the above options "hot", meaning that it can be used while the source cluster is running and that it produces a consistent backup (a snapshot or checkpoint of the source cluster's data)? I imagine only replication of the whole cluster (point 4) above) is really "hot"? 2) If the HBase cluster lives in EC2, what's the best thing to do with the backup/snapshot? EBS may be too expensive. Are people stuffing their HBase backups into S3 somehow, despite the S3 per-bucket limit of 5 GB? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/
