[
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910753#comment-16910753
]
Eric Yang commented on YARN-9562:
---------------------------------
[~ebadger] Node manager can fail to start when configuration defines to use
image-tag-to-hash-files on hdfs, and the file does not exist yet. User may run
start hadoop cluster, then populate HDFS with necessary files to run the
cluster. In the current implementation, it requires user to perform:
# start hdfs
# run squashfs image scripts
# start yarn cluster
If the sequence is not followed, node manager crashes with error that looks
like this:
{code}
2019-08-19 20:26:48,704 INFO org.apache.hadoop.security.UserGroupInformation:
Login successful for user nm/[email protected] using keytab
file /etc/security/keytabs/nm.service.keytab. Keytab auto renewal enabled :
false
2019-08-19 20:26:48,707 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
Privileged Execution Command Array:
[/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor,
--reap-runc-layer-mounts, 100]
2019-08-19 20:26:48,715 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
command array:
2019-08-19 20:26:48,715 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
[/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor,
--reap-runc-layer-mounts, 100]
2019-08-19 20:26:48,715 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
Privileged Execution Operation Output:
2019-08-19 20:26:48,716 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
2019-08-19 20:26:49,453 INFO org.apache.hadoop.service.AbstractService: Service
ImageTagToManifestPluginService failed in state STARTED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,456 INFO org.apache.hadoop.service.AbstractService: Service
NodeManager failed in state INITED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping NodeManager metrics system...
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NodeManager metrics system stopped.
2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NodeManager metrics system shutdown complete.
2019-08-19 20:26:49,458 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting
NodeManager
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-19 20:26:49,465 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
{code}
It would be nice, if this is a warning message instead of hard fail.
> Add Java changes for the new RuncContainerRuntime
> -------------------------------------------------
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Eric Badger
> Assignee: Eric Badger
> Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch,
> YARN-9562.003.patch, YARN-9562.004.patch
>
>
> This JIRA will be used to add the Java changes for the new
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the
> existing DockerLinuxContainerRuntime code once it is moved up into an
> abstract class that can be extended.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]