One way is to create a backup cluster or a secondary cluster. 1. Ingest data in both clusters in "parallel", basically run jobs in both the clusters. This will kind of help you in backup and also make sure that you can switch over to the back up cluster when you have troubles with the Primary cluster. This setup usually makes sense when you have 2 Data centers with one being Primary DC and the other Backup. 2. Have a primary cluster and a secondary which is kept in sync with thr primary. Usually distcp type of jobs. Cloudera gives a front end to manage this replications but essentially does a distcp in the background. 3. If your data ingestion is flume/kafka etc, you can use it to write to both Primary/secondary clusters.
I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I guess somebody who does can comment. On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <[email protected]> wrote: > Hi, > > Any idea how to backup and restore Hadoop 2.x? Use tape or form a new > Hadoop cluster, or any other options? > > I use Hadoop 2.6 with HBase and Hive > > Thanks > > Regards > >
