[ https://issues.apache.org/jira/browse/YARN-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681525#comment-16681525 ]
Thomas Graves commented on YARN-8991: ------------------------------------- [~teonadi] can you clarify here. Are you saying its not getting cleaned up while the Spark application is still running or its not getting cleaned up after the spark application finishes? > nodemanager not cleaning blockmgr directories inside appcache > -------------------------------------------------------------- > > Key: YARN-8991 > URL: https://issues.apache.org/jira/browse/YARN-8991 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Hidayat Teonadi > Priority: Major > Attachments: yarn-nm-log.txt > > > Hi, I'm running spark on yarn and have enabled the Spark Shuffle Service. I'm > noticing that during the lifetime of my spark streaming application, the nm > appcache folder is building up with blockmgr directories (filled with > shuffle_*.data). > Looking into the nm logs, it seems like the blockmgr directories is not part > of the cleanup process of the application. Eventually disk will fill up and > app will crash. I have both > {{yarn.nodemanager.localizer.cache.cleanup.interval-ms}} and > {{yarn.nodemanager.localizer.cache.target-size-mb}} set, so I don't think its > a configuration issue. > What is stumping me is the executor ID listed by spark during the external > shuffle block registration doesn't match the executor ID listed in yarn's nm > log. Maybe this executorID disconnect explains why the cleanup is not done ? > I'm assuming that blockmgr directories are supposed to be cleaned up ? > > {noformat} > 2018-11-05 15:01:21,349 INFO > org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Registered > executor AppExecId{appId=application_1541045942679_0193, execId=1299} with > ExecutorShuffleInfo{localDirs=[/mnt1/yarn/nm/usercache/auction_importer/appcache/application_1541045942679_0193/blockmgr-b9703ae3-722c-47d1-a374-abf1cc954f42], > subDirsPerLocalDir=64, > shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager} > {noformat} > > seems similar to https://issues.apache.org/jira/browse/YARN-7070, although > I'm not sure if the behavior I'm seeing is spark use related. > [https://stackoverflow.com/questions/52923386/spark-streaming-job-doesnt-delete-shuffle-files] > has a stop gap solution of cleaning up via cron. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org