[ https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe reassigned YARN-2314: -------------------------------- Assignee: Jason Lowe bq. Basically the cache doesn't have more functionalities other than just cache the connection. It doesn't even do that, because if we cache the connection to the NM then we leak threads. When a cache entry is purged the RPC Client thread (tied to the NM socket connection) can linger because the RPC layer doesn't provide a way to force a connection to be closed due to protocol refcounting. We need to set the RPC idle timeout to 0 as a workaround to force the connections to close so we don't leak threads. Therefore all the cache is doing is caching the proxy objects with no connection behind them. Those objects will reconnect to the NM each time we make a call. Not sure saving the proxy objects themselves is worth it -- would be interesting to prove this cache helps in a meaningful way before we assume we need it. But I can update the patch to provide a config property to keep it anyway, hope to have that up later today. > ContainerManagementProtocolProxy can create thousands of threads for a large > cluster > ------------------------------------------------------------------------------------ > > Key: YARN-2314 > URL: https://issues.apache.org/jira/browse/YARN-2314 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Affects Versions: 2.1.0-beta > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Attachments: disable-cm-proxy-cache.patch, > nmproxycachefix.prototype.patch > > > ContainerManagementProtocolProxy has a cache of NM proxies, and the size of > this cache is configurable. However the cache can grow far beyond the > configured size when running on a large cluster and blow AM address/container > limits. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)