Haibo Chen created YARN-9111:
--------------------------------
Summary: NM crashes because Fair scheduler promotes a container
that has not been pulled by AM
Key: YARN-9111
URL: https://issues.apache.org/jira/browse/YARN-9111
Project: Hadoop YARN
Issue Type: Sub-task
Components: fairscheduler, nodemanager
Affects Versions: YARN-1011
Reporter: Haibo Chen
{code:java}
2018-10-19 22:34:35,052 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher:
Error in dispatcher thread
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:323)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:1649)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:185)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
2018-10-19 22:34:35,054 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Exiting, bbye..
2018-10-19 22:34:35,059 DEBUG org.apache.hadoop.service.AbstractService:
Service: NodeManager entered state STOPPED{code}
When a container is allocated by RM to an application, its container token is
not generated until the AM pulls that container from RM.
However, it the scheduler decides to promote that container before it is pulled
by the AM, it does not have container token to work with.
The current code does not update/generate the container token as such. When
container promotion is sent to NM to process, the NM crashes on NPE.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]