Thanks for bringing this up, 100% inode utilization is an issue I haven't
seen raised before and this raises another issue which is not on our
current roadmap for state cleanup (cleaning up data which was not fully
cleaned up from a crashed process).


On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski <
og...@plainvanillagames.com> wrote:

>  Bleh, strike that, one of my slaves was at 100% inode utilization on the
> file system. It was /tmp/spark* leftovers that apparently did not get
> cleaned up properly after failed or interrupted jobs.
> Mental note - run a cron job on all slaves and master to clean up
> /tmp/spark* regularly.
>
> Thanks (and sorry for the noise)!
> Ognen
>
>
> On 3/23/14, 9:52 PM, Ognen Duzlevski wrote:
>
> Aaron, thanks for replying. I am very much puzzled as to what is going on.
> A job that used to run on the same cluster is failing with this mysterious
> message about not having enough disk space when in fact I can see through
> "watch df -h" that the free space is always hovering around 3+GB on the
> disk and the free inodes are at 50% (this is on master). I went through
> each slave and the spark/work/app*/stderr and stdout and spark/logs/*out
> files and no mention of too many open files failures on any of the slaves
> nor on the master :(
>
> Thanks
> Ognen
>
> On 3/23/14, 8:38 PM, Aaron Davidson wrote:
>
> By default, with P partitions (for both the pre-shuffle stage and
> post-shuffle), there are P^2 files created.
> With spark.shuffle.consolidateFiles turned on, we would instead create only
> P files. Disk space consumption is largely unaffected, however. by the
> number of partitions unless each partition is particularly small.
>
>  You might look at the actual executors' logs, as it's possible that this
> error was caused by an earlier exception, such as "too many open files".
>
>
> On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski <
> og...@plainvanillagames.com> wrote:
>
>>  On 3/23/14, 5:49 PM, Matei Zaharia wrote:
>>
>> You can set spark.local.dir to put this data somewhere other than /tmp if
>> /tmp is full. Actually it's recommended to have multiple local disks and
>> set to to a comma-separated list of directories, one per disk.
>>
>>  Matei, does the number of tasks/partitions in a transformation influence
>> something in terms of disk space consumption? Or inode consumption?
>>
>> Thanks,
>> Ognen
>>
>
>
> --
> "A distributed system is one in which the failure of a computer you didn't 
> even know existed can render your own computer unusable"
> -- Leslie Lamport
>
>

Reply via email to