[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-2314:
--------------------------------

    Assignee: Jason Lowe

bq. Basically the cache doesn't have more functionalities other than just cache 
the connection.

It doesn't even do that, because if we cache the connection to the NM then we 
leak threads.  When a cache entry is purged the RPC Client thread (tied to the 
NM socket connection) can linger because the RPC layer doesn't provide a way to 
force a connection to be closed due to protocol refcounting.  We need to set 
the RPC idle timeout to 0 as a workaround to force the connections to close so 
we don't leak threads.  Therefore all the cache is doing is caching the proxy 
objects with no connection behind them.  Those objects will reconnect to the NM 
each time we make a call.

Not sure saving the proxy objects themselves is worth it -- would be 
interesting to prove this cache helps in a meaningful way before we assume we 
need it.  But I can update the patch to provide a config property to keep it 
anyway, hope to have that up later today.

> ContainerManagementProtocolProxy can create thousands of threads for a large 
> cluster
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-2314
>                 URL: https://issues.apache.org/jira/browse/YARN-2314
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: disable-cm-proxy-cache.patch, 
> nmproxycachefix.prototype.patch
>
>
> ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
> this cache is configurable.  However the cache can grow far beyond the 
> configured size when running on a large cluster and blow AM address/container 
> limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to