[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910753#comment-16910753 ]
Eric Yang commented on YARN-9562: --------------------------------- [~ebadger] Node manager can fail to start when configuration defines to use image-tag-to-hash-files on hdfs, and the file does not exist yet. User may run start hadoop cluster, then populate HDFS with necessary files to run the cluster. In the current implementation, it requires user to perform: # start hdfs # run squashfs image scripts # start yarn cluster If the sequence is not followed, node manager crashes with error that looks like this: {code} 2019-08-19 20:26:48,704 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user nm/eyang-3.openstacklo...@example.com using keytab file /etc/security/keytabs/nm.service.keytab. Keytab auto renewal enabled : false 2019-08-19 20:26:48,707 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Privileged Execution Command Array: [/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor, --reap-runc-layer-mounts, 100] 2019-08-19 20:26:48,715 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: command array: 2019-08-19 20:26:48,715 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: [/usr/local/hadoop-3.3.0-SNAPSHOT/bin/container-executor, --reap-runc-layer-mounts, 100] 2019-08-19 20:26:48,715 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Privileged Execution Operation Output: 2019-08-19 20:26:48,716 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: 2019-08-19 20:26:49,453 INFO org.apache.hadoop.service.AbstractService: Service ImageTagToManifestPluginService failed in state STARTED java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069) 2019-08-19 20:26:49,456 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069) 2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system... 2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped. 2019-08-19 20:26:49,457 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete. 2019-08-19 20:26:49,458 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:309) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069) 2019-08-19 20:26:49,465 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG: {code} It would be nice, if this is a warning message instead of hard fail. > Add Java changes for the new RuncContainerRuntime > ------------------------------------------------- > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Badger > Assignee: Eric Badger > Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org