Hi Andrew, here's what i found. Maybe would be relevant for people with
the same issue:
1) There's 3 types of local resources in YARN (public, private,
application). More about it here:
http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/
2) Spark cache is of application type of resource.
3) Currently it's not possible to specify quota for application
resources (https://issues.apache.org/jira/browse/YARN-882)
4) The only it's possible to specify these 2 settings:
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
- The maximum percentage of disk space utilization allowed after which a
disk is marked as bad. Values can range from 0.0 to 100.0. If the value
is greater than or equal to 100, the nodemanager will check for full
disk. This applies to yarn-nodemanager.local-dirs and
yarn.nodemanager.log-dirs.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb - The
minimum space that must be available on a disk for it to be used. This
applies to yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs.
5) Yarn's cache cleanup doesn't cleaned app resources:
https://github.com/apache/hadoop/blob/8d58512d6e6d9fe93784a9de2af0056bcc316d96/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L511
As i understood application resources cleaned when spark application
correctly terminates (using sc.stop()). But in my case when it fills all
disk space it was stucked and couldn't stop correctly. After i restarted
yarn i don't know how easily trigger cache cleanup except of manually on
all the nodes.
Thanks,
Peter Rudenko
On 2015-07-10 20:07, Andrew Or wrote:
Hi Peter,
AFAIK Spark assumes infinite disk space, so there isn't really a way
to limit how much space it uses. Unfortunately I'm not aware of a
simpler workaround than to simply provision your cluster with more
disk space. By the way, are you sure that it's disk space that
exceeded the limit, but not the number of inodes? If it's the latter,
maybe you could control the ulimit of the container.
To answer your other question: if it can't persist to disk then yes it
will fail. It will only recompute from the data source if for some
reason someone evicted our blocks from memory, but that shouldn't
happen in your case since your'e using MEMORY_AND_DISK_SER.
-Andrew
2015-07-10 3:51 GMT-07:00 Peter Rudenko <petro.rude...@gmail.com
<mailto:petro.rude...@gmail.com>>:
Hi, i have a spark ML worklflow. It uses some persist calls. When
i launch it with 1 tb dataset - it puts down all cluster, becauses
it fills all disk space at /yarn/nm/usercache/root/appcache:
http://i.imgur.com/qvRUrOp.png
I found a yarn settings:
/yarn/.nodemanager.localizer./cache/.target-size-mb - Target size
of localizer cache in MB, per nodemanager. It is a target
retention size that only includes resources with PUBLIC and
PRIVATE visibility and excludes resources with APPLICATION visibility
But it excludes resources with APPLICATION visibility, and spark
cache as i understood is of APPLICATION type.
Is it possible to restrict a disk space for spark application?
Will spark fail if it wouldn't be able to persist on disk
(StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data
source?
Thanks,
Peter Rudenko