Using Distcp when EC2 deployed with CDH4

Matt Cheah Fri, 06 Dec 2013 12:43:25 -0800

Hi everyone,

I used to launch EC2 clusters with the spark scripts running Hadoop 1. I 
recently changed it and launched a new cluster with the hadoop major version 
set to 2.


Spark-ec2 <args> --hadoop-major-version=2 <more-args>

In the old cluster, I would start persistent-hdfs and migrate data from S3 with 
distcp with:

Persistent-hdfs/bin/hadoop distcp <src> <dst>

However, when I do the same thing on the new cluster, I get an error:

/root/Persistent-hdfs/sbin/start-all.sh
/root/Persistent-hdfs/bin/hadoop distcp <src> <dst>

2013-12-06 20:38:44,808 INFO  mapreduce.Cluster (Cluster.java:initialize(114)) 
- Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to 
error: Invalid "mapreduce.jobtracker.address" configuration value for 
LocalJobRunner : "ec2-54-193-48-31.us-west-1.compute.amazonaws.com:9001"
2013-12-06 20:38:44,809 ERROR tools.DistCp (DistCp.java:run(126)) - Exception 
encountered
java.io.IOException: Cannot initialize Cluster. Please check your configuration 
for mapreduce.framework.name and the correspond server addresses.

I'm wondering how the cluster has been configured differently when Hadoop 2 is 
specified for the EC2 scripts, and why distcp isn't working here.  Thanks!

-Matt Cheah

Using Distcp when EC2 deployed with CDH4

Reply via email to