Kanwaljeet Sachdev created YARN-8242:
----------------------------------------
Summary: YARN NM: OOM error while reading back the state store on
recovery
Key: YARN-8242
URL: https://issues.apache.org/jira/browse/YARN-8242
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn
Affects Versions: 3.2.0
Reporter: Kanwaljeet Sachdev
On startup the NM reads its state store and builds a list of application in the
state store to process. If the number of applications in the state store is
large and have a lot of "state" connected to it the NM can run OOM and never
get to the point that it can start processing the recovery.
Since it never starts the recovery there is no way for the NM to ever pass this
point. It will require a change in heap size to get the NM started.
Following is the stack trace
{code:java}
at java.lang.OutOfMemoryError.<init> (OutOfMemoryError.java:48) at
com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at
com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init>
(YarnProtos.java:47069) at
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init>
(YarnProtos.java:47014) at
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
(YarnProtos.java:47102) at
org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
(YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage
(CodedInputStream.java:309) at
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init>
(YarnProtos.java:41016) at
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init>
(YarnProtos.java:40942) at
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
(YarnProtos.java:41080) at
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
(YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage
(CodedInputStream.java:309) at
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init>
(YarnServiceProtos.java:24517) at
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init>
(YarnServiceProtos.java:24464) at
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
(YarnServiceProtos.java:24568) at
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom
(YarnServiceProtos.java:24563) at
com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141)
at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at
com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at
org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom
(YarnServiceProtos.java:24739) at
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState
(NMLeveldbStateStoreService.java:217) at
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState
(NMLeveldbStateStoreService.java:170) at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover
(ContainerManagerImpl.java:253) at
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit
(ContainerManagerImpl.java:237) at
org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at
org.apache.hadoop.service.CompositeService.serviceInit
(CompositeService.java:107) at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit
(NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init
(AbstractService.java:163) at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager
(NodeManager.java:474) at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main
(NodeManager.java:521){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]