I've been using the same method to launch my clusters then pull my data from S3 to local hdfs:
$SPARKHOME/ec2/spark-ec2 -k mykey -i ~/.ssh/mykey.pem -s 29 --instance-type=r3.8xlarge --placement-group=pcavariants --copy-aws-credentials --hadoop-major-version=2 --spot-price=2.8 launch mycluster --region=us-west-2 then ephemeral-hdfs/bin/hadoop distcp s3n://agittens/CFSRArawtars CFSRArawtars Before this worked as I'd expect. Within the last several days, I've been getting this error when I run the distcp command: 2015-12-10 00:16:43,113 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 2015-12-10 00:16:43,207 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 2015-12-10 00:16:43,422 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 2015-12-10 00:16:43,513 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 2015-12-10 00:16:43,737 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 2015-12-10 00:16:43,830 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200 2015-12-10 00:16:44,015 WARN httpclient.RestS3Service (RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' - Unexpected response code 404, expected 200 2015-12-10 00:16:46,141 WARN conf.Configuration (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb 2015-12-10 00:16:46,141 WARN conf.Configuration (Configuration.java:warnOnceIfDeprecated(824)) - io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor 2015-12-10 00:16:46,630 INFO service.AbstractService (AbstractService.java:init(81)) - Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 2015-12-10 00:16:46,630 INFO service.AbstractService (AbstractService.java:start(94)) - Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. 2015-12-10 00:16:47,135 INFO mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(368)) - number of splits:21 Then the job hangs and does nothing until I kill it. Any idea what the problem is and how to fix it, or a work-around for getting my data off S3 quickly? It is around 4 TB. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distcp-suddenly-broken-with-spark-ec2-script-setup-tp25658.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org