[
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906663#comment-16906663
]
Eric Yang commented on YARN-9562:
---------------------------------
[~ebadger]
1. Node manager crashes if the defined images-tag-to-hash-files does not exist.
It would be nice, if this is a warning instead.
{code}
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,002 INFO org.apache.hadoop.service.AbstractService: Service
NodeManager failed in state INITED
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,003 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping NodeManager metrics system...
2019-08-13 21:07:03,003 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NodeManager metrics system stopped.
2019-08-13 21:07:03,004 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
NodeManager metrics system shutdown complete.
2019-08-13 21:07:03,005 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting
NodeManager
java.lang.RuntimeException: Couldn't load any image-tag-to-hash-files
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.ImageTagToManifestPlugin.serviceStart(ImageTagToManifestPlugin.java:315)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.start(RuncContainerRuntime.java:277)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.start(DelegatingLinuxContainerRuntime.java:283)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.start(LinuxContainerExecutor.java:351)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:519)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:989)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1069)
2019-08-13 21:07:03,016 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
{code}
2. Running mapreduce job using runc container, the patch still reference to
incorrect path:
{code}
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.FileNotFoundException: File does not exist:
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getResource(HdfsManifestToResourcesPlugin.java:180)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getConfigResource(HdfsManifestToResourcesPlugin.java:143)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.RuncContainerRuntime.getLocalResources(RuncContainerRuntime.java:526)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.getLocalResources(DelegatingLinuxContainerRuntime.java:265)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.getLocalResources(LinuxContainerExecutor.java:1062)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:1218)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:1167)
at
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2129)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:101)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1576)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1569)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:131)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException:
java.io.FileNotFoundException: File does not exist:
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
at
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
at
com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:90)
at
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:199)
at
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.getResource(HdfsManifestToResourcesPlugin.java:172)
... 17 more
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://eyang-1.openstacklocal:9000/user/yarn/null/config/9f38484d220fa527b1fb19747638497179500a1bed8bf0498eb788229229e6e1
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1590)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1598)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin.statBlob(HdfsManifestToResourcesPlugin.java:187)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin$1.load(HdfsManifestToResourcesPlugin.java:109)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.HdfsManifestToResourcesPlugin$1.load(HdfsManifestToResourcesPlugin.java:106)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
... 23 more
{code}
> Add Java changes for the new RuncContainerRuntime
> -------------------------------------------------
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Eric Badger
> Assignee: Eric Badger
> Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch,
> YARN-9562.003.patch
>
>
> This JIRA will be used to add the Java changes for the new
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the
> existing DockerLinuxContainerRuntime code once it is moved up into an
> abstract class that can be extended.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]