I've been using the same method to launch my clusters then pull my data from
S3 to local hdfs:

$SPARKHOME/ec2/spark-ec2 -k mykey -i ~/.ssh/mykey.pem -s 29
--instance-type=r3.8xlarge --placement-group=pcavariants
--copy-aws-credentials --hadoop-major-version=2 --spot-price=2.8 launch
mycluster --region=us-west-2

then

ephemeral-hdfs/bin/hadoop distcp s3n://agittens/CFSRArawtars CFSRArawtars

Before this worked as I'd expect. Within the last several days, I've been
getting this error when I run the distcp command:
2015-12-10 00:16:43,113 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
2015-12-10 00:16:43,207 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response
'/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
2015-12-10 00:16:43,422 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
2015-12-10 00:16:43,513 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response
'/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
2015-12-10 00:16:43,737 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
2015-12-10 00:16:43,830 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response
'/CFSRArawtars_%24folder%24' - Unexpected response code 404, expected 200
2015-12-10 00:16:44,015 WARN  httpclient.RestS3Service
(RestS3Service.java:performRequest(393)) - Response '/CFSRArawtars' -
Unexpected response code 404, expected 200
2015-12-10 00:16:46,141 WARN  conf.Configuration
(Configuration.java:warnOnceIfDeprecated(824)) - io.sort.mb is deprecated.
Instead, use mapreduce.task.io.sort.mb
2015-12-10 00:16:46,141 WARN  conf.Configuration
(Configuration.java:warnOnceIfDeprecated(824)) - io.sort.factor is
deprecated. Instead, use mapreduce.task.io.sort.factor
2015-12-10 00:16:46,630 INFO  service.AbstractService
(AbstractService.java:init(81)) -
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
2015-12-10 00:16:46,630 INFO  service.AbstractService
(AbstractService.java:start(94)) -
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
2015-12-10 00:16:47,135 INFO  mapreduce.JobSubmitter
(JobSubmitter.java:submitJobInternal(368)) - number of splits:21

Then the job hangs and does nothing until I kill it. Any idea what the
problem is and how to fix it, or a work-around for getting my data off S3
quickly? It is around 4 TB.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/distcp-suddenly-broken-with-spark-ec2-script-setup-tp25658.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to