One way is to create a backup cluster or a secondary cluster.
1. Ingest data in both clusters in "parallel", basically run jobs in both
the clusters. This will kind of help you in backup and also make sure that
you can switch over to the back up cluster when you have troubles with the
Primary cluster. This setup usually makes sense when you have 2 Data
centers with one being Primary DC and the other Backup.
2. Have a primary cluster and a secondary which is kept in sync with thr
primary. Usually distcp type of jobs. Cloudera gives a front end to manage
this replications but essentially does a distcp in the background.
3. If your data ingestion is flume/kafka etc, you can use it to write to
both Primary/secondary clusters.

I am not sure if anybody uses a tape/archive to backup a hadoop cluster. I
guess somebody who does can comment.


On Wed, Sep 9, 2015 at 11:34 AM, Arthur Chan <[email protected]>
wrote:

> Hi,
>
> Any idea how to backup and restore Hadoop 2.x?   Use tape or form a new
> Hadoop cluster, or any other options?
>
> I use Hadoop 2.6 with HBase and Hive
>
> Thanks
>
> Regards
>
>

Reply via email to