Wilfred Spiegelenburg created YARN-7585:
-------------------------------------------
Summary: NodeManager should go unhealthy when state store throws
DBException
Key: YARN-7585
URL: https://issues.apache.org/jira/browse/YARN-7585
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
If work preserving recover is enabled the NM will not start up if the state
store does not initialise. However if the state store becomes unavailable after
that for any reason the NM will not go unhealthy.
Since the state store is not available new containers can not be started any
more and the NM should become unhealthy:
{code}
AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got
exception: org.apache.hadoop.yarn.exceptions.YarnException:
java.io.IOException: org.iq80.leveldb.DBException: IO error:
/dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
...
Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error:
/dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
at
o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
at
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
at
o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]