[
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181809#comment-14181809
]
Jason Lowe commented on YARN-2314:
----------------------------------
bq. IIUC, mayBeCloseProxy can be invoked by MR/NMClient, but
proxy.scheduledForClose is always false. So it won’t call the following
stopProxy.
proxy.scheduledForClose is not always false, as it can be set to true by
removeProxy. removeProxy is called by the cache when an entry needs to be
evicted from the cache. If the cache never fills then we never will call
removeProxy by the very design of the cache. This patch doesn't change the
behavior in that sense. I suppose we could change the patch so that it only
caches the proxy objects but not their underlying connections. However I have
my doubts that's where the real expense is in creating the proxy -- it's much
more likely to be establishing the RPC connection to the NM.
bq. once ContainerManagementProtocolProxy#tryCloseProxy is called, internally
it’ll call rpc.stopProxy, will it eventually call ClientCache#stopClient
ClientCache#stopClient will not necessarily shut down the connection. It will
only shutdown the connection if there are no references to the protocol by any
other objects, but the very nature of the ContainerManagementProtocolProxy
cache is to keep around references. Therefore stopClient will never actually
do anything in practice as long as we are caching proxy objects. That's why I
mentioned earlier that the RPC layer itself needs to change to add the ability
to shutdown connections or change the way the ClientCache behaves to really fix
this if we want to continue to cache proxy objects at a higher layer.
> ContainerManagementProtocolProxy can create thousands of threads for a large
> cluster
> ------------------------------------------------------------------------------------
>
> Key: YARN-2314
> URL: https://issues.apache.org/jira/browse/YARN-2314
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Affects Versions: 2.1.0-beta
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: YARN-2314.patch, YARN-2314v2.patch,
> disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch,
> tez-yarn-2314.xlsx
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of
> this cache is configurable. However the cache can grow far beyond the
> configured size when running on a large cluster and blow AM address/container
> limits. More details in the first comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)