I have a primary Hadoop cluster (2.6.0) running Mapreduce and HBase. I
am backing up to a remote data center that has many fewer machines with
a higher per disk density.
The default HDFS replication factor on the primary is 3.
The default HDFS replication factor on the primary is 2.
When I run distcp on the primary cluster specifying the remote are the
source, and I DO NOT specify preserve replication factor as an argument,
I still get 3 replicas on the remote.
All my HBase snapshots that are copied from the primary to the backup
also end up with h-files that have a replication factor of 3.
As a test I ran distcp from the backup pulling from the primary and this
did result in a replication factor of 2. I have many fewer resources on
the backup and think that it would be faster to perform the large copy
with a larger number of machines.
As well I can not pull HBase snapshots from the backup cluster. The
ExportSnapshot utility does not support this.
Does anyone know if it is possible to distcp to another cluster that has
a smaller replication factor and have that take effect.
Thanks!
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org