[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181809#comment-14181809
 ] 

Jason Lowe commented on YARN-2314:
----------------------------------

bq. IIUC, mayBeCloseProxy can be invoked by MR/NMClient, but 
proxy.scheduledForClose is always false. So it won’t call the following 
stopProxy.

proxy.scheduledForClose is not always false, as it can be set to true by 
removeProxy.  removeProxy is called by the cache when an entry needs to be 
evicted from the cache.  If the cache never fills then we never will call 
removeProxy by the very design of the cache.  This patch doesn't change the 
behavior in that sense.  I suppose we could change the patch so that it only 
caches the proxy objects but not their underlying connections.  However I have 
my doubts that's where the real expense is in creating the proxy -- it's much 
more likely to be establishing the RPC connection to the NM.

bq. once ContainerManagementProtocolProxy#tryCloseProxy is called, internally 
it’ll call rpc.stopProxy, will it eventually call ClientCache#stopClient

ClientCache#stopClient will not necessarily shut down the connection.  It will 
only shutdown the connection if there are no references to the protocol by any 
other objects, but the very nature of the ContainerManagementProtocolProxy 
cache is to keep around references.  Therefore stopClient will never actually 
do anything in practice as long as we are caching proxy objects.  That's why I 
mentioned earlier that the RPC layer itself needs to change to add the ability 
to shutdown connections or change the way the ClientCache behaves to really fix 
this if we want to continue to cache proxy objects at a higher layer.

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2314
>                 URL: https://issues.apache.org/jira/browse/YARN-2314
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-2314.patch, YARN-2314v2.patch, 
> disable-cm-proxy-cache.patch, nmproxycachefix.prototype.patch, 
> tez-yarn-2314.xlsx
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to