Kanwaljeet Sachdev created YARN-8242: ----------------------------------------
Summary: YARN NM: OOM error while reading back the state store on recovery Key: YARN-8242 URL: https://issues.apache.org/jira/browse/YARN-8242 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 3.2.0 Reporter: Kanwaljeet Sachdev On startup the NM reads its state store and builds a list of application in the state store to process. If the number of applications in the state store is large and have a lot of "state" connected to it the NM can run OOM and never get to the point that it can start processing the recovery. Since it never starts the recovery there is no way for the NM to ever pass this point. It will require a change in heap size to get the NM started. Following is the stack trace {code:java} at java.lang.OutOfMemoryError.<init> (OutOfMemoryError.java:48) at com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> (YarnProtos.java:47069) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto.<init> (YarnProtos.java:47014) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom (YarnProtos.java:47102) at org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage (CodedInputStream.java:309) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> (YarnProtos.java:41016) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.<init> (YarnProtos.java:40942) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom (YarnProtos.java:41080) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage (CodedInputStream.java:309) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init> (YarnServiceProtos.java:24517) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.<init> (YarnServiceProtos.java:24464) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom (YarnServiceProtos.java:24568) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto$1.parsePartialFrom (YarnServiceProtos.java:24563) at com.google.protobuf.AbstractParser.parsePartialFrom (AbstractParser.java:141) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:176) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:188) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:193) at com.google.protobuf.AbstractParser.parseFrom (AbstractParser.java:49) at org.apache.hadoop.yarn.proto.YarnServiceProtos$StartContainerRequestProto.parseFrom (YarnServiceProtos.java:24739) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainerState (NMLeveldbStateStoreService.java:217) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.loadContainersState (NMLeveldbStateStoreService.java:170) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover (ContainerManagerImpl.java:253) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit (ContainerManagerImpl.java:237) at org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit (CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit (NodeManager.java:255) at org.apache.hadoop.service.AbstractService.init (AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager (NodeManager.java:474) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main (NodeManager.java:521){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org