[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910753#comment-16910753
 ] 

Eric Yang commented on YARN-9562:
---------------------------------

[~ebadger] Node manager can fail to start when configuration defines to use 
image-tag-to-hash-files on hdfs, and the file does not exist yet.  User may run 
start hadoop cluster, then populate HDFS with necessary files to run the 
cluster.  In the current implementation, it requires user to perform:

# start hdfs
# run squashfs image scripts
# start yarn cluster

If the sequence is not followed, node manager crashes with error that looks 
like this:

{code}
2019-08-19 20:26:48,704 INFO org.apache.hadoop.security.UserGroupInformation: 
Login successful for user nm/eyang-3.openstacklo...@example.com using keytab 
file /etc/security/keytabs/nm.service.keytab. Keytab auto renewal enabled : 
false
2019-08-19 20:26:48,707 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Privileged Execution Command Array: 
[/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor, 
--reap-runc-layer-mounts, 100]
2019-08-19 20:26:48,715 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 command array:
2019-08-19 20:26:48,715 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 [/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor, 
--reap-runc-layer-mounts, 100]
2019-08-19 20:26:48,715 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Privileged Execution Operation Output:
2019-08-19 20:26:48,716 DEBUG 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
2019-08-19 20:26:49,453 INFO org.apache.hadoop.service.AbstractService: Service 
ImageTagToManifestPluginService failed in state STARTED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,456 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping NodeManager metrics system...
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NodeManager metrics system stopped.
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NodeManager metrics system shutdown complete.
2019-08-19 20:26:49,458 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,465 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
{code}

It would be nice, if this is a warning message instead of hard fail.

> Add Java changes for the new RuncContainerRuntime
> -------------------------------------------------
>
>                 Key: YARN-9562
>                 URL: https://issues.apache.org/jira/browse/YARN-9562
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to