Re: No space left on device error when pulling data from s3

2014-05-15 Thread darkjh
tly add this in the spark-ec2 script. Writing lots of tmp files in the 8GB `/` is not a great idea. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-space-left-on-device-error-when-pulling-data-from-s3-tp5450p5518.html Sent from the Apache Spark User Li

Re: No space left on device error when pulling data from s3

2014-05-12 Thread Han JU
Set `hadoop.tmp.dir` in `spark-env.sh` solved the problem. Spark job no longer writes tmp files in /tmp/hadoop-root/. SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs" export SPARK_JAVA_OPTS I'm wondering if we need to permanently add this in th

Re: No space left on device error when pulling data from s3

2014-05-06 Thread Han JU
After some investigation, I found out that there's lots of temp files under /tmp/hadoop-root/s3/ But this is strange since in both conf files, ~/ephemeral-hdfs/conf/core-site.xml and ~/spark/conf/core-site.xml, the setting `hadoop.tmp.dir` is set to `/mnt/ephemeral-hdfs/`. Why spark jobs still wr

Re: No space left on device error when pulling data from s3

2014-05-06 Thread Akhil Das
I wonder why is your / is full. Try clearing out /tmp and also make sure in the spark-env.sh you have put SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark" Thanks Best Regards On Tue, May 6, 2014 at 9:35 PM, Han JU wrote: > Hi, > > I've a `no space left on device` exception when pulling some 22

No space left on device error when pulling data from s3

2014-05-06 Thread Han JU
Hi, I've a `no space left on device` exception when pulling some 22GB data from s3 block storage to the ephemeral HDFS. The cluster is on EC2 using spark-ec2 script with 4 m1.large. The code is basically: val in = sc.textFile("s3://...") in.saveAsTextFile("hdfs://...") Spark creates 750 inpu