Re: no space left at worker node

Kelvin Chu Sun, 08 Feb 2015 20:12:07 -0800

I guess you may set the parameters below to clean the directories:

spark.worker.cleanup.enabled
spark.worker.cleanup.interval
spark.worker.cleanup.appDataTtl


They are described here:
http://spark.apache.org/docs/1.2.0/spark-standalone.html

Kelvin

On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow <eyc...@hotmail.com> wrote:

> I found the problem is, for each application, the Spark worker node saves
> the corresponding std output and std err under ./spark/work/appid, where
> appid is the id of the application.  If I ran several applications in a
> row, it will out of space.  In my case, the disk usage under ./spark/work/
> is as follows:
>
> 1689784 ./app-20150208203033-0002/0
> 1689788 ./app-20150208203033-0002
> 40324 ./driver-20150208180505-0001
> 1691400 ./app-20150208180509-0001/0
> 1691404 ./app-20150208180509-0001
> 40316 ./driver-20150208203030-0002
> 40320 ./driver-20150208173156-0000
> 1649876 ./app-20150208173200-0000/0
> 1649880 ./app-20150208173200-0000
> 5152036 .
>
> Any suggestion how to resolve it?  Thanks.
>
> Ey-Chih Chow
> ------------------------------
> From: eyc...@hotmail.com
> To: gen.tan...@gmail.com
> CC: user@spark.apache.org
> Subject: RE: no space left at worker node
> Date: Sun, 8 Feb 2015 15:25:43 -0800
>
>
> By this way, the input and output paths of the job are all in s3.  I did
> not use paths of hdfs as input or output.
>
> Best regards,
>
> Ey-Chih Chow
>
> ------------------------------
> From: eyc...@hotmail.com
> To: gen.tan...@gmail.com
> CC: user@spark.apache.org
> Subject: RE: no space left at worker node
> Date: Sun, 8 Feb 2015 14:57:15 -0800
>
> Hi Gen,
>
> Thanks.  I save my logs in a file under /var/log.  This is the only place
> to save data.  Will the problem go away if I use a better machine?
>
> Best regards,
>
> Ey-Chih Chow
>
> ------------------------------
> Date: Sun, 8 Feb 2015 23:32:27 +0100
> Subject: Re: no space left at worker node
> From: gen.tan...@gmail.com
> To: eyc...@hotmail.com
> CC: user@spark.apache.org
>
> Hi,
>
> I am sorry that I made a mistake. r3.large has only one SSD which has been
> mounted in /mnt. Therefore this is no /dev/sdc.
> In fact, the problem is that there is no space in the under / directory.
> So you should check whether your application write data under this
> directory(for instance, save file in file:///).
>
> If not, you can use watch du -sh to during the running time to figure out
> which directory is expanding. Normally, only /mnt directory which is
> supported by SSD is expanding significantly, because the data of hdfs is
> saved here. Then you can find the directory which caused no space problem
> and find out the specific reason.
>
> Cheers
> Gen
>
>
>
> On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow <eyc...@hotmail.com> wrote:
>
> Thanks Gen.  How can I check if /dev/sdc is well mounted or not?  In
> general, the problem shows up when I submit the second or third job.  The
> first job I submit most likely will succeed.
>
> Ey-Chih Chow
>
> ------------------------------
> Date: Sun, 8 Feb 2015 18:18:03 +0100
>
> Subject: Re: no space left at worker node
> From: gen.tan...@gmail.com
> To: eyc...@hotmail.com
> CC: user@spark.apache.org
>
> Hi,
>
> In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about
> double mount. However, there is no information about /mnt2. You should
> check whether /dev/sdc is well mounted or not.
> The reply of Micheal is good solution about this type of problem. You can
> check his site.
>
> Cheers
> Gen
>
>
> On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow <eyc...@hotmail.com> wrote:
>
> Gen,
>
> Thanks for your information.  The content of /etc/fstab at the worker node
> (r3.large) is:
>
> #
> LABEL=/     /           ext4    defaults,noatime  1   1
> tmpfs       /dev/shm    tmpfs   defaults        0   0
> devpts      /dev/pts    devpts  gid=5,mode=620  0   0
> sysfs       /sys        sysfs   defaults        0   0
> proc        /proc       proc    defaults        0   0
> /dev/sdb        /mnt    auto
>  defaults,noatime,nodiratime,comment=cloudconfig 0       0
> /dev/sdc        /mnt2   auto
>  defaults,noatime,nodiratime,comment=cloudconfig 0       0
>
> There is no entry of /dev/xvdb.
>
>  Ey-Chih Chow
>
> ------------------------------
> Date: Sun, 8 Feb 2015 12:09:37 +0100
> Subject: Re: no space left at worker node
> From: gen.tan...@gmail.com
> To: eyc...@hotmail.com
> CC: user@spark.apache.org
>
>
> Hi,
>
> I fact, I met this problem before. it is a bug of AWS. Which type of
> machine do you use?
>
> If I guess well, you can check the file /etc/fstab. There would be a
> double mount of /dev/xvdb.
> If yes, you should
> 1. stop hdfs
> 2. umount /dev/xvdb at /
> 3. restart hdfs
>
> Hope this could be helpful.
> Cheers
> Gen
>
>
>
> On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow <eyc...@hotmail.com> wrote:
>
> Hi,
>
> I submitted a spark job to an ec2 cluster, using spark-submit.  At a worker
> node, there is an exception of 'no space left on device' as follows.
>
> ==========================================
> 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file
> /root/spark/work/app-20150208014557-0003/0/stdout
> java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:345)
>         at
>
> org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
>         at
>
> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72)
>         at
>
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
>         at
>
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
>         at
>
> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>         at
>
> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
> ===========================================
>
> The command df showed the following information at the worker node:
>
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/xvda1             8256920   8256456         0 100% /
> tmpfs                  7752012         0   7752012   0% /dev/shm
> /dev/xvdb             30963708   1729652  27661192   6% /mnt
>
> Does anybody know how to fix this?  Thanks.
>
>
> Ey-Chih Chow
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>

Re: no space left at worker node

Reply via email to