[
https://issues.apache.org/jira/browse/YARN-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269622#comment-16269622
]
Robert Kanter commented on YARN-5594:
-------------------------------------
I know this is an old JIRA that hasn't been updated in over a year, but we're
running into this problem now and I was doing some investigation. This is
caused by YARN-2743 - it incompatibly changes the format that the tokens are
stored in the RMStateStore. We (Cloudera) had actually reverted YARN-2743 from
CDH as a workaround.
Anyway, as is, this breaks upgrades (rolling or not) from a version of Hadoop
without YARN-2743 (i.e. Hadoop < 2.6.0) to a version with it (i.e. >= Hadoop
2.6.0), if you have delegation tokens in your RMStateStore. To fix this, I
think [~Tatyana But] was on the right track by having it read the old format as
a fallback. Though the patch needs updating to make it work with more than
just the {{FileSystemRMStateStore}}.
If nobody minds, I'll take over this JIRA.
{quote}What happens when the format changes again?{quote}
We should try to avoid incompatibly changing the format again in the future.
If we need to for some reason, we should make sure there's some path to handle
it.
> Handle old data format while recovering RM
> ------------------------------------------
>
> Key: YARN-5594
> URL: https://issues.apache.org/jira/browse/YARN-5594
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.0
> Reporter: Tatyana But
> Labels: oct16-medium
> Attachments: YARN-5594.001.patch
>
>
> We've got that error after upgrade cluster from v.2.5.1 to 2.7.0.
> {noformat}
> 2016-08-25 17:20:33,293 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to
> load/recover state
> com.google.protobuf.InvalidProtocolBufferException: Protocol message contained
> an invalid tag (zero).
> at
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
> at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.<init>(YarnServerResourceManagerRecoveryProtos.java:4680)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.<init>(YarnServerResourceManagerRecoveryProtos.java:4644)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4740)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4735)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:5075)
> at
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:4955)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:337)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:210)
> at
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:904)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.records.RMDelegationTokenIdentifierData.readFields(RMDelegationTokenIdentifierData.java:43)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:355)
> at
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044
> {noformat}
> The reason of this problem is that we use different formats of files
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot/RMDelegationToken*
> in these hadoop versions.
> This fix handle old data format during RM recover if
> InvalidProtocolBufferException occures.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]