It maybe easier to copy the data to s3 and then from s3 to the new cluster.

On Fri, Sep 19, 2014 at 8:45 PM, Jameel Al-Aziz <[email protected]> wrote:

>  Hi all,
>
>  We’re in the process of migrating from EC2-Classic to VPC and needed to
> transfer our HDFS data. We setup a new cluster inside the VPC, and assigned
> the name node and data node temporary public IPs. Initially, we had a lot
> of trouble getting the name node to redirect to the public hostname instead
> of private IPs. After some fiddling around, we finally got webhdfs and dfs
> -cp to work using public hostnames. However, distcp simply refuses to use
> the public hostnames when connecting to the data nodes.
>
>  We’re running distcp on the old cluster, copying data into the new
> cluster.
>
>  The old hadoop cluster is running 1.0.4 and the new one is running 1.2.1.
>
>  So far, on the new cluster, we’ve tried:
>  - Using public DNS hostnames in the master and slaves files (on both the
> name node and data nodes)
>  - Setting the hostname of all the boxes to their public DNS name
>  - Setting “fs.default.name” to the public DNS name of the new name node.
>
>  And on both clusters:
>  - Setting the “dfs.datanode.use.datanode.hostname” and
> “dfs.client.use.datanode.hostname” to “true" on both the old and new
> cluster.
>
>  Even though webhdfs is finally redirecting to data nodes using the
> public hostname, we keep seeing errors when running distcp. The errors are
> all similar to: http://pastebin.com/ZYR07Fvm
>
>  What do we need to do to get distcp to use the public hostname of the
> new machines? I haven’t tried running distcp in the other direction (I’m
> about to), but I suspect I’ll run into the same problem.
>
>  Thanks!
>  Jameel
>

Reply via email to