Miklos Szegedi commented on YARN-8310:

Thank you for the patch [~rkanter].
 I have a couple of comments.
 {{readFields()}} casts from {{DataInput}} to {{DataInputStream}}, it might be 
valuable to do a check first with a warning.
Also {{DataInput}} does not have a {{reset()}} by default and we do not know, 
who called the {{mark()}} before. It might make sense to call the {{mark()}} 
or, even better do a {{DataInput.readFully()}} and apply the two parsers on the 
same byte array.
{{public void readFields(DataInput in) throws IOException}} may throw an 
exception early, if the new format expects more characters. We should fall back 
to {{readFieldsInOldFormat}} in case of an {{IOException}} and throw it only if 
that one throws the same.
It might make sense to write some unit tests that pass a {{DataInput}} that is 
not a {{DataInputStream}}
If we stick to {{reset()}} we might want to use int in a catch block as well, 
so that others can read the data, if we fail.
372         int logAggregationSize = -1;
I would probably use a separate boolean here, and set it in the success 
scenario to true, so that we cover the case of a -1 parsed and provide a 
meaningful error code.
123         String[] hostAddr = in.readUTF().split(":");
The case is not covered, when the string does not have a colon. Also we do not 
check, whether the port number is in the range of two bytes and non negative. 
It might fail later in YARN, if we omit this check.

> Handle old NMTokenIdentifier, AMRMTokenIdentifier, and 
> ContainerTokenIdentifier formats
> ---------------------------------------------------------------------------------------
>                 Key: YARN-8310
>                 URL: https://issues.apache.org/jira/browse/YARN-8310
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>            Priority: Major
>         Attachments: YARN-8310.001.patch, YARN-8310.002.patch, 
> YARN-8310.branch-2.001.patch, YARN-8310.branch-2.002.patch
> In some recent upgrade testing, we saw this error causing the NodeManager to 
> fail to startup afterwards:
> {noformat}
> org.apache.hadoop.service.ServiceStateException: 
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> contained an invalid tag (zero).
>       at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>       at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:441)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:834)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:895)
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message contained an invalid tag (zero).
>       at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
>       at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
>       at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1860)
>       at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.<init>(YarnSecurityTokenProtos.java:1824)
>       at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2016)
>       at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto$1.parsePartialFrom(YarnSecurityTokenProtos.java:2011)
>       at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
>       at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>       at 
> org.apache.hadoop.yarn.proto.YarnSecurityTokenProtos$ContainerTokenIdentifierProto.parseFrom(YarnSecurityTokenProtos.java:2686)
>       at 
> org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:254)
>       at 
> org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:177)
>       at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:322)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:455)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:373)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:316)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>       ... 5 more
> {noformat}
> The NodeManager fails because it's trying to read a 
> {{ContainerTokenIdentifier}} in the "old" format before we changed them to 
> protobufs (YARN-668).  This is very similar to YARN-5594 where we ran into a 
> similar problem with the ResourceManager and RM Delegation Tokens.
> To provide a better experience, we should make the code able to read the old 
> format if it's unable to read it using the new format.  We didn't run into 
> any errors with the other two types of tokens that YARN-668 incompatibly 
> changed (NMTokenIdentifier and AMRMTokenIdentifier), but we may as well fix 
> those while we're at it.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to