Hi,
> > 1) make a export/backup of 1 table at a time using > > org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684 > > This is actually checked in. See: > > ./bin/hadoop jar hbase-0.X.X.jar > > > 2) copy 1 table at a time using > > >http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/CopyTable.html > > > > > 3) use distcp to copy the whole /hbase part of HDFS > > 4) replicate the whole cluster - http://hbase.apache.org/replication.html > > 5) count on HDFS replication and live without the standard backup > > > > > > What I'm not sure about is the following: > > > > 1) Is any one of the above options "hot", meaning that it can be used > > while >the > > source cluster is running and that it produces a consistent backup (a >snapshot > > or checkpoint of the source cluster's data)? > > I imagine only replication of the whole cluster (point 4) above) is really > > "hot"? > > > > Options 1) and 2) will give you a snapshot on a table at a particular > instance in time. You'll get the state of the row at the time the > MapReduce job crosses that row. Hm, isn't this contradictory? That is, doesn't "snapshot of a table at a particular instance in time" means that I'd get a snapshot of *all* rows at a single point in time, and not a value of a row when the Export or Copy MR job crosses it? Also, it seems like all options are per-table, right? There is nothing other than near real-time full-cluster replication that would back up all tables at once? This is important when you have multiple tables storing data that depend on each other. Imagine tables A and B where table B depends on A. If you first back up A, then by the time I back up B, it may reference some data in A that my A's backup doesn't contain. If you flip the order and first back up B, then by the time I back up A it may contain some extra data that B's backup doesn't refer to. Simply put, the backup copies of these 2 tables won't be in sync. How do people deal with this? Would it make sense to document this sort of stuff on http://hbase.apache.org/book/book.html ? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/
