[ 
https://issues.apache.org/jira/browse/YARN-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269622#comment-16269622
 ] 

Robert Kanter commented on YARN-5594:
-------------------------------------

I know this is an old JIRA that hasn't been updated in over a year, but we're 
running into this problem now and I was doing some investigation.  This is 
caused by YARN-2743 - it incompatibly changes the format that the tokens are 
stored in the RMStateStore.  We (Cloudera) had actually reverted YARN-2743 from 
CDH as a workaround.  

Anyway, as is, this breaks upgrades (rolling or not) from a version of Hadoop 
without YARN-2743 (i.e. Hadoop < 2.6.0) to a version with it (i.e. >= Hadoop 
2.6.0), if you have delegation tokens in your RMStateStore.  To fix this, I 
think [~Tatyana But] was on the right track by having it read the old format as 
a fallback.  Though the patch needs updating to make it work with more than 
just the {{FileSystemRMStateStore}}.

If nobody minds, I'll take over this JIRA.

{quote}What happens when the format changes again?{quote}
We should try to avoid incompatibly changing the format again in the future.  
If we need to for some reason, we should make sure there's some path to handle 
it.

> Handle old data format while recovering RM
> ------------------------------------------
>
>                 Key: YARN-5594
>                 URL: https://issues.apache.org/jira/browse/YARN-5594
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Tatyana But
>              Labels: oct16-medium
>         Attachments: YARN-5594.001.patch
>
>
> We've got that error after upgrade cluster from v.2.5.1 to 2.7.0.
> {noformat}
> 2016-08-25 17:20:33,293 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to
> load/recover state
> com.google.protobuf.InvalidProtocolBufferException: Protocol message contained
> an invalid tag (zero).
> at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
> at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.<init>(YarnServerResourceManagerRecoveryProtos.java:4680)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.<init>(YarnServerResourceManagerRecoveryProtos.java:4644)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4740)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4735)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:5075)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:4955)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:337)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:210)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.records.RMDelegationTokenIdentifierData.readFields(RMDelegationTokenIdentifierData.java:43)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:355)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044
> {noformat}
> The reason of this problem is that we use different formats of files 
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot/RMDelegationToken*
>  in these hadoop versions.
> This fix handle old data format during RM recover if 
> InvalidProtocolBufferException occures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to