It maybe easier to copy the data to s3 and then from s3 to the new cluster.
On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz <[email protected]> wrote: > Hi all, > > We’re in the process of migrating from EC2-Classic to VPC and needed to > transfer our HDFS data. We setup a new cluster inside the VPC, and assigned > the name node and data node temporary public IPs. Initially, we had a lot > of trouble getting the name node to redirect to the public hostname instead > of private IPs. After some fiddling around, we finally got webhdfs and dfs > -cp to work using public hostnames. However, distcp simply refuses to use > the public hostnames when connecting to the data nodes. > > We’re running distcp on the old cluster, copying data into the new > cluster. > > The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1. > > So far, on the new cluster, we’ve tried: > - Using public DNS hostnames in the master and slaves files (on both the > name node and data nodes) > - Setting the hostname of all the boxes to their public DNS name > - Setting “fs.default.name” to the public DNS name of the new name node. > > And on both clusters: > - Setting the “dfs.datanode.use.datanode.hostname” and > “dfs.client.use.datanode.hostname” to “true" on both the old and new > cluster. > > Even though webhdfs is finally redirecting to data nodes using the > public hostname, we keep seeing errors when running distcp. The errors are > all similar to: http://pastebin.com/ZYR07Fvm > > What do we need to do to get distcp to use the public hostname of the > new machines? I haven’t tried running distcp in the other direction (I’m > about to), but I suspect I’ll run into the same problem. > > Thanks! > Jameel >
