[
https://issues.apache.org/jira/browse/YARN-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862433#comment-16862433
]
Tan, Wangda edited comment on YARN-9327 at 6/12/19 8:22 PM:
------------------------------------------------------------
This seems tricky, because the ResourcePBImpl doesn't have proper protection as
I can see. A similar fix is https://issues.apache.org/jira/browse/YARN-2387
I think we should make sure at least {{getProto}} and {{maybeInitBuilder}}
protected by synchronized lock. Now the synchronized lock is only on
{{mergeLocalToBuilder}}, which is not sufficient.
This won't protect read stale value of resource information, but if we want to
protect read/write resource information path, we need to carefully look at
performance impact.
Remove the synchronized static lock seems like a right fix, it looks like a
mistake in previous patch.
was (Author: wangda):
This seems tricky, because the ResourcePBImpl doesn't have proper protection as
I can see. A similar fix is https://issues.apache.org/jira/browse/YARN-2387
I think we should make sure at least {{getProto}} and {{maybeInitBuilder
}}protected by synchronized lock. Now the synchronized lock is only on
{{mergeLocalToBuilder}}, which is not sufficient.
This won't protect read stale value of resource information, but if we want to
protect read/write resource information path, we need to carefully look at
performance impact.
Remove the synchronized static lock seems like a right fix, it looks like a
mistake in previous patch.
> ProtoUtils#convertToProtoFormat block Application Master Service and many more
> ------------------------------------------------------------------------------
>
> Key: YARN-9327
> URL: https://issues.apache.org/jira/browse/YARN-9327
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: YARN-9327.001.patch
>
>
> {code}
> public static synchronized ResourceProto convertToProtoFormat(Resource r) {
> return ResourcePBImpl.getProto(r);
> }
> {code}
> {noformat}
> "IPC Server handler 41 on 23764" #324 daemon prio=5 os_prio=0
> tid=0x00007f181de72800 nid=0x222 waiting for monitor entry
> [0x00007ef153dad000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertToProtoFormat(ProtoUtils.java:404)
> - waiting to lock <0x00007ef2d8bcf6d8> (a java.lang.Class for
> org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.convertToProtoFormat(NodeReportPBImpl.java:315)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:262)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:289)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:228)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:844)
> - locked <0x00007f0fed968a30> (a
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:72)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1.next(AllocateResponsePBImpl.java:810)
> - locked <0x00007f0fed96f500> (a
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$7$1.next(AllocateResponsePBImpl.java:799)
> at
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:13810)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:158)
> - locked <0x00007f0fed968a30> (a
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:198)
> - eliminated <0x00007f0fed968a30> (a
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
> at
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:103)
> - locked <0x00007f0fed968a30> (a
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl)
> at
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:824)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2684){noformat}
> Seems synchronization is not required here.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]