[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906663#comment-16906663
 ] 

Eric Yang commented on YARN-9562:
---------------------------------

[~ebadger] 
1. Node manager crashes if the defined images-tag-to-hash-files does not exist. 
 It would be nice, if this is a warning instead.
{code}
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,002 INFO org.apache.hadoop.service.AbstractService: Service 
NodeManager failed in state INITED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,003 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping NodeManager metrics system...
2019-08-13 21:07:03,003 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NodeManager metrics system stopped.
2019-08-13 21:07:03,004 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NodeManager metrics system shutdown complete.
2019-08-13 21:07:03,005 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
{code}

2.  Running mapreduce job using runc container, the patch still reference to 
incorrect path:

{code}
java.io.IOException: java.util.concurrent.ExecutionException: 
java.io.FileNotFoundException: File does not exist: 
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getResource(HdfsManifestToResourcesPlugin.java:180)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getConfigResource(HdfsManifestToResourcesPlugin.java:143)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.getLocalResources(RuncContainerRuntime.java:526)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.getLocalResources(DelegatingLinuxContainerRuntime.java:265)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.getLocalResources(LinuxContainerExecutor.java:1062)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:1218)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:1167)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2129)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:101)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1576)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1569)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:131)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
java.io.FileNotFoundException: File does not exist: 
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
        at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
        at 
com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:90)
        at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:199)
        at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
        at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
        at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
        at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getResource(HdfsManifestToResourcesPlugin.java:172)
        ... 17 more
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1590)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.statBlob(HdfsManifestToResourcesPlugin.java:187)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin$1.load(HdfsManifestToResourcesPlugin.java:109)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin$1.load(HdfsManifestToResourcesPlugin.java:106)
        at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
        at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
        ... 23 more
{code}

> Add Java changes for the new RuncContainerRuntime
> -------------------------------------------------
>
>                 Key: YARN-9562
>                 URL: https://issues.apache.org/jira/browse/YARN-9562
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to