tly add this in the spark-ec2 script.
Writing lots of tmp files in the 8GB `/` is not a great idea.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/No-space-left-on-device-error-when-pulling-data-from-s3-tp5450p5518.html
Sent from the Apache Spark User Li
Set `hadoop.tmp.dir` in `spark-env.sh` solved the problem. Spark job no
longer writes tmp files in /tmp/hadoop-root/.
SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark
-Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"
export SPARK_JAVA_OPTS
I'm wondering if we need to permanently add this in th
After some investigation, I found out that there's lots of temp files under
/tmp/hadoop-root/s3/
But this is strange since in both conf files,
~/ephemeral-hdfs/conf/core-site.xml and ~/spark/conf/core-site.xml, the
setting `hadoop.tmp.dir` is set to `/mnt/ephemeral-hdfs/`. Why spark jobs
still wr
I wonder why is your / is full. Try clearing out /tmp and also make sure in
the spark-env.sh you have put SPARK_JAVA_OPTS+="
-Dspark.local.dir=/mnt/spark"
Thanks
Best Regards
On Tue, May 6, 2014 at 9:35 PM, Han JU wrote:
> Hi,
>
> I've a `no space left on device` exception when pulling some 22
Hi,
I've a `no space left on device` exception when pulling some 22GB data from
s3 block storage to the ephemeral HDFS. The cluster is on EC2 using
spark-ec2 script with 4 m1.large.
The code is basically:
val in = sc.textFile("s3://...")
in.saveAsTextFile("hdfs://...")
Spark creates 750 inpu