There are a couple of nits... 1) Compression. This will help a bit when moving the files around.
2) Data size. You may have bandwidth issues. Moving TBs of data over a 1GBe network can impact your cluster's performance. (Even with compression) Depending on your cluster(s) and infrastructure, there is going to be a point where the cost of trying to back up to tape is going to exceed the cost of replicating to a second cluster. At the same time, you have to remember that restoring TBs of data will take time. How large a data set will vary by organization. Again, only you can determine the value of your data. If you are backing up to a secondary cluster ... you can use the replication feature in HBase. This would be a better fit if you are looking at backing up a large set of HBase tables. On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote: > Hi Michael, > > Thanks a lot for the reply. What I want to achieve is, if my cluster goes > down for some reason, I should be able to create a new cluster and should be > able to import all the backed up data. As I want to store all the tables, I > expect the data size to be huge (in order of Tera Bytes) and it will keep > growing. > > If I have understood correctly, you have suggested to run "export" to get > the data into hdfs and then run "hadoop fs -copyToLocal" to get it into > local file. If I take a back up of the files, is it possible to import that > data to a new Hbase cluster? > > Thanks and regards, > Amlan > > -----Original Message----- > From: Michael Segel [mailto:[email protected]] > Sent: Monday, July 23, 2012 8:19 PM > To: [email protected] > Subject: Re: Hbase bkup options > > Amian, > > Like always the answer to your question is... it depends. > > First, how much data are we talking about? > > What's the value of the underlying data? > > One possible scenario... > You run a M/R job to copy data from the table to an HDFS file, that is then > copied to attached storage on an edge node and then to tape. > Depending on how much data, how much disk is in the attached storage you may > want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold > copy on tape off to some offsite storage facility. > > There are other options, but it all depends on what you want to achieve. > > With respect to the other tools... > > You can export (which is a m/r job) to a local directory, then use distcp > to a different cluster. hadoop fs -copyToLocal will let you copy off the > cluster. > You could write your own code, but you don't get much gain over existing > UNIX/Linux tools. > > > On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote: > >> Hi, >> >> >> >> Is it feasible to do disk or tape backup for Hbase tables? >> >> >> >> I have read about the tools like Export, CopyTable, Distcp. It seems like >> they will require a separate HDFS cluster to do that. >> >> >> >> Regards, >> >> Amlan >> > >
