[
https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702655#comment-15702655
]
Oleksii Dymytrov commented on YARN-5924:
----------------------------------------
Kindly review the PR.
> Resource Manager fails to load state with InvalidProtocolBufferException
> ------------------------------------------------------------------------
>
> Key: YARN-5924
> URL: https://issues.apache.org/jira/browse/YARN-5924
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Oleksii Dymytrov
> Assignee: Oleksii Dymytrov
>
> InvalidProtocolBufferException is thrown during recovering of the
> application's state if application's data has invalid format (or is broken)
> under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in
> HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message
> end-group tag did not match expected tag.
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
> at
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
> at
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
> at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning
> and remove application's folder that contains invalid data to prevent RM
> restart failure.
> Additionally, I've added catch for other exceptions that can appear during
> recovering of the specific application, to avoid RM failure even if the only
> one application's state can't be loaded.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]