Botong Huang created YARN-8010:
----------------------------------
Summary: add config in FederationRMFailoverProxy to not bypass
facade cache when failing over
Key: YARN-8010
URL: https://issues.apache.org/jira/browse/YARN-8010
Project: Hadoop YARN
Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang
Today when YarnRM is failing over, the FederationRMFailoverProxy running in
AMRMProxy will perform failover, try to get latest subcluster info from
FederationStateStore and then retry connect to the latest YarnRM master. When
calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache
with a flush flag. When YarnRM is failing over, every AM heartbeat thread
creates a different thread inside FederationInterceptor, each of which keeps
performing failover several times. This leads to a big spike of getSubCluster
call to FederationStateStore.
Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), YarnRM
master slave change might not result in RM addr change. In other cases, a small
delay of getting latest subcluster information may be acceptable. This patch
thus creates a config option, so that it is possible to ask the
FederationRMFailoverProxy to not flush cache when calling getSubCluster().
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]