[ 
https://issues.apache.org/jira/browse/YARN-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138597#comment-16138597
 ] 

Jason Lowe commented on YARN-7070:
----------------------------------

bq. At this point, I don't believe this is a YARN bug.

I disagree.  Even if the spark shuffle handler doesn't clean anything up, these 
files are underneath the application's appcache area in YARN.  The nodemanager 
is supposed to clean this up when the application completes regardless of what 
the auxiliary services are doing.

>From the log we see it at least tried to do this:
{noformat}
2017-08-22 05:20:01,260 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949
{noformat}

This looks a lot like the scenario that was fixed in YARN-6846, since the 
container is getting killed just as the application completes.  Note how close 
the two deletes are occurring near each other:
{noformat}
2017-08-22 05:20:01,260 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949/container_e24_1501810184023_55949_01_000079
2017-08-22 05:20:01,260 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
event APPLICATION_STOP for appId application_1501810184023_55949
2017-08-22 05:20:01,260 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Stopping application application_1501810184023_55949
2017-08-22 05:20:01,260 INFO 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Application 
application_1501810184023_55949 removed, cleanupLocalDirs = false
2017-08-22 05:20:01,260 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache/application_1501810184023_55949
{noformat}

These two deletes are probably racing in parallel.  I highly recommend applying 
the patch from YARN-6846 and see if things improve.


> some of local cache files for yarn can't be deleted
> ---------------------------------------------------
>
>                 Key: YARN-7070
>                 URL: https://issues.apache.org/jira/browse/YARN-7070
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>         Environment: Hadoop 2.8.1
>            Reporter: Changyao Ye
>         Attachments: application_1501810184023_55949.log
>
>
> We have found some of cache files(in 
> /tmp/hadoop-yarn/nm-local-dir/usercache/hdfs/appcache) for yarn on 
> nodemanager cannot be deleted properly. The directories are like 
> following(blockmgr***)
> =================
> # ls -ltr application_1501810184023_55949
> total 120
> drwx--x---  2 hdfs yarn 4096 Aug 22 04:29 filecache
> drwxr-s---  2 hdfs yarn 4096 Aug 22 04:56 
> blockmgr-881fab2c-fba4-4bb1-8dd9-5ab35a512df7
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 04:56 
> blockmgr-bf8a19f5-e9ae-4269-a0ef-b27d0f9c17e7
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 04:58 
> blockmgr-f3437e8d-9595-4898-8bda-92ebff3ada1d
> drwxr-s--- 18 hdfs yarn 4096 Aug 22 05:01 
> blockmgr-930c0cd8-1d31-4cdb-a244-f6ad4bf74bff
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-83fc0702-ac40-4743-812a-7d488e92004e
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-f6cfe045-12c3-41d6-b77e-aa5200daeb6a
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-53dcb4ea-ba5d-4b8b-859b-805b9303a149
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-0c0c4bb9-ef5e-4ca1-8d23-ce5cd58d0a75
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-557d0f39-67d2-491a-9307-12fc1d724380
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-fbc87680-4df7-498e-bf6d-456a5aea4fc9
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:13 
> blockmgr-53ee8251-fac1-4f62-82c2-5e970f0d86ec
> drwxr-s---  9 hdfs yarn 4096 Aug 22 05:14 
> blockmgr-5a8bc187-abcf-482d-9da5-e8c4647d4731
> drwxr-s--- 10 hdfs yarn 4096 Aug 22 05:14 
> blockmgr-251c3a99-cd85-442a-8945-52c344c0d861
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:14 
> blockmgr-c352c1ad-15dc-456b-8b62-5b83b9950494
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-b4f01347-4b51-4b35-8146-2aa840084c2b
> drwxr-s--- 14 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-0095d26c-c134-48b4-82a6-e8ae02f0189c
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-28a31574-61ae-459f-be3a-8608892246d7
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-c0cd0df9-b355-4209-b6aa-b549a1fa36eb
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-a2730abb-9517-461e-bedf-d9a2dcef373f
> drwxr-s--- 14 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-91dd2e1a-6bc2-4429-8b71-2f4240987159
> drwxr-s--- 12 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-f4e3a586-8817-45ea-a197-9fdbb3d91946
> drwxr-s--- 15 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-ba2c605e-89d8-4f7c-b42c-6ed4ba6bf4ea
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-2ae72383-5f72-4002-84a7-e6335b8c2b6c
> drwxr-s--- 13 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-6c5e260f-d3c7-4af6-91c1-168c73343f2d
> drwxr-s--- 16 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-2e9923b1-281c-4a9d-8069-6c5430bd5fc3
> drwxr-s--- 18 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-cc3f1406-d8a2-4bf5-a276-8f7aed75c513
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-975bcce0-84b2-4590-880b-bf182d76e319
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-ce82cb63-5998-4227-b85e-77f1c633db43
> drwxr-s--- 11 hdfs yarn 4096 Aug 22 05:15 
> blockmgr-592af4aa-3c89-4081-8746-29b99f2220b1
> =================
> We also applied patches YARN-4594, YARN-4731, but nothing changed.
> YARN-4594 https://issues.apache.org/jira/browse/YARN-4594
> YARN-4731 https://issues.apache.org/jira/browse/YARN-4731
> Any advice will be greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to