Bleh, strike that, one of my slaves was at 100% inode utilization on the
file system. It was /tmp/spark* leftovers that apparently did not get
cleaned up properly after failed or interrupted jobs.
Mental note - run a cron job on all slaves and master to clean up
/tmp/spark* regularly.
Thanks (and sorry for the noise)!
Ognen
On 3/23/14, 9:52 PM, Ognen Duzlevski wrote:
Aaron, thanks for replying. I am very much puzzled as to what is going
on. A job that used to run on the same cluster is failing with this
mysterious message about not having enough disk space when in fact I
can see through "watch df -h" that the free space is always hovering
around 3+GB on the disk and the free inodes are at 50% (this is on
master). I went through each slave and the spark/work/app*/stderr and
stdout and spark/logs/*out files and no mention of too many open files
failures on any of the slaves nor on the master :(
Thanks
Ognen
On 3/23/14, 8:38 PM, Aaron Davidson wrote:
By default, with P partitions (for both the pre-shuffle stage and
post-shuffle), there are P^2 files created.
With spark.shuffle.consolidateFiles turned on, we would instead
create only P files. Disk space consumption is largely unaffected,
however. by the number of partitions unless each partition is
particularly small.
You might look at the actual executors' logs, as it's possible that
this error was caused by an earlier exception, such as "too many open
files".
On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski
<og...@plainvanillagames.com <mailto:og...@plainvanillagames.com>> wrote:
On 3/23/14, 5:49 PM, Matei Zaharia wrote:
You can set spark.local.dir to put this data somewhere other
than /tmp if /tmp is full. Actually it's recommended to have
multiple local disks and set to to a comma-separated list of
directories, one per disk.
Matei, does the number of tasks/partitions in a transformation
influence something in terms of disk space consumption? Or inode
consumption?
Thanks,
Ognen
--
"A distributed system is one in which the failure of a computer you didn't even know
existed can render your own computer unusable"
-- Leslie Lamport