[ https://issues.apache.org/jira/browse/YARN-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904719#comment-17904719 ]
Chenyu Zheng commented on YARN-9968: ------------------------------------ [~snemeth] [~tarunparimi] Hi, I also encountered this problem (because of the /tmp directory was deleted by mistake), causing spark to be stuck. For me, I think when the `Public Localizer` thread exits, the NM should be shut down, because there is no point in keeping an abnormal NM running. How about you? > Public Localizer is exiting in NodeManager due to NullPointerException > ---------------------------------------------------------------------- > > Key: YARN-9968 > URL: https://issues.apache.org/jira/browse/YARN-9968 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.1.0 > Reporter: Tarun Parimi > Assignee: Tarun Parimi > Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-9968.001.patch > > > The Public Localizer is encountering a NullPointerException and exiting. > {code:java} > ERROR localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(995)) - Error: Shutting down > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:981) > INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(997)) - Public cache exiting > {code} > The NodeManager still keeps on running. Subsequent localization events for > containers keep encountering the below error, resulting in failed > Localization of all new containers. > {code:java} > ERROR localizer.ResourceLocalizationService > (ResourceLocalizationService.java:addResource(920)) - Failed to submit rsrc { > { hdfs://namespace/raw/user/.staging/job/conf.xml 1572071824603, FILE, null > },pending,[(container_e30_1571858463080_48304_01_000134)],12513553420029113,FAILED} > for download. Either queue is full or threadpool is shutdown. > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ExecutorCompletionService$QueueingFuture@55c7fa21 > rejected from > org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor@46067edd[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = > 382286] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) > at > java.util.concurrent.ExecutorCompletionService.submit(ExecutorCompletionService.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:899) > {code} > When this happens, the NodeManager becomes usable only after a restart. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org