I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I am backing up to a remote data center that has many fewer machines with a higher per disk density.

The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.

When I run distcp on the primary cluster specifying the remote are the source, and I DO NOT specify preserve replication factor as an argument, I still get 3 replicas on the remote.

All my HBase snapshots that are copied from the primary to the backup also end up with h-files that have a replication factor of 3.

As a test I ran distcp from the backup pulling from the primary and this did result in a replication factor of 2. I have many fewer resources on the backup and think that it would be faster to perform the large copy with a larger number of machines.

As well I can not pull HBase snapshots from the backup cluster. The ExportSnapshot utility does not support this.

Does anyone know if it is possible to distcp to another cluster that has a smaller replication factor and have that take effect.

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to