distcp on ec2 standalone spark cluster

Tomer Benyamini Sun, 07 Sep 2014 06:43:32 -0700

Hi,

I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
running on the cluster - I'm getting the exception below.


Is there a way to activate it, or is there a spark alternative to distcp?

Thanks,
Tomer

mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
Invalid "mapreduce.jobtracker.address" configuration value for
LocalJobRunner : "XXX:9001"

ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.

at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)

at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)

at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

distcp on ec2 standalone spark cluster

Reply via email to