We recently did a backup from one cluster to another cluster using our in-house tool Blueshift (https://github.com/flipkart-incubator/BlueShift), you can try this tool.
Thanks, Chetna Chaudhari On 9 September 2015 at 12:12, James Bond <[email protected]> wrote: > One way is to create a backup cluster or a secondary cluster. > 1. Ingest data in both clusters in "parallel", basically run jobs in both > the clusters. This will kind of help you in backup and also make sure that > you can switch over to the back up cluster when you have troubles with the > Primary cluster. This setup usually makes sense when you have 2 Data > centers with one being Primary DC and the other Backup. > 2. Have a primary cluster and a secondary which is kept in sync with thr > primary. Usually distcp type of jobs. Cloudera gives a front end to manage > this replications but essentially does a distcp in the background. > 3. If your data ingestion is flume/kafka etc, you can use it to write to > both Primary/secondary clusters. > > I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I > guess somebody who does can comment. > > > On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <[email protected]> > wrote: > >> Hi, >> >> Any idea how to backup and restore Hadoop 2.x? Use tape or form a new >> Hadoop cluster, or any other options? >> >> I use Hadoop 2.6 with HBase and Hive >> >> Thanks >> >> Regards >> >> >
