[ 
https://issues.apache.org/jira/browse/YARN-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880111#comment-17880111
 ] 

ASF GitHub Bot commented on YARN-11709:
---------------------------------------

zeekling commented on code in PR #6960:
URL: https://github.com/apache/hadoop/pull/6960#discussion_r1749152353


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java:
##########
@@ -451,8 +451,10 @@ public void startLocalizer(LocalizerStartContext ctx)
 
     } catch (PrivilegedOperationException e) {
       int exitCode = e.getExitCode();
-      LOG.warn("Exit code from container {} startLocalizer is : {}",
-          locId, exitCode, e);
+      LOG.error("Unrecoverable issue occurred. Marking the node as unhealthy 
to prevent "
+          + "further containers to get scheduled on the node and cause 
application failures. " +
+          "Exit code from the container " + locId + "startLocalizer is : " + 
exitCode, e);
+      nmContext.getNodeStatusUpdater().reportException(e);

Review Comment:
   ok





> NodeManager should be shut down or blacklisted when it cannot run program 
> "/var/lib/yarn-ce/bin/container-executor"
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11709
>                 URL: https://issues.apache.org/jira/browse/YARN-11709
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: container-executor
>            Reporter: Ferenc Erdelyi
>            Assignee: Ferenc Erdelyi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>
> When NodeManager encounters the below "No such file or directory" error 
> reported against the "container-executor", it should give up participating in 
> the cluster as it is not capable to run any container, but just fail the jobs.
> {code:java}
> 2023-01-18 10:08:10,600 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_e159_1673543180101_9407_02_
> 000014 startLocalizer is : -1
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  java.io.IOException: Cannot run program 
> "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:403)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.j
> ava:1250)
> Caused by: java.io.IOException: Cannot run program 
> "/var/lib/yarn-ce/bin/container-executor": error=2, No such file or directory
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to