Oleksii Dymytrov created YARN-5924:
--------------------------------------
Summary: Resource Manager fails to load state with
InvalidProtocolBufferException
Key: YARN-5924
URL: https://issues.apache.org/jira/browse/YARN-5924
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 3.0.0-alpha1
Reporter: Oleksii Dymytrov
InvalidProtocolBufferException can be thrown during recovering of the
application's state if application's data will have invalid format (or will be
broken) under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory
in HDFS:
{noformat}
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group
tag did not match expected tag.
at
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
at
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
at
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
at
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
{noformat}
The solution can be to catch "InvalidProtocolBufferException", show warning and
remove application's folder that contains invalid data to prevent RM restart
failure.
Additionally, I've added catch for other exceptions that can appear during
recovering of the specific application, to avoid RM failure even if the only
one application's state can't be loaded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]