Botong Huang created YARN-8010:
----------------------------------

             Summary: add config in FederationRMFailoverProxy to not bypass 
facade cache when failing over
                 Key: YARN-8010
                 URL: https://issues.apache.org/jira/browse/YARN-8010
             Project: Hadoop YARN
          Issue Type: Task
            Reporter: Botong Huang
            Assignee: Botong Huang


Today when YarnRM is failing over, the FederationRMFailoverProxy running in 
AMRMProxy will perform failover, try to get latest subcluster info from 
FederationStateStore and then retry connect to the latest YarnRM master. When 
calling getSubCluster() to FederationStateStoreFacade, it bypasses the cache 
with a flush flag. When YarnRM is failing over, every AM heartbeat thread 
creates a different thread inside FederationInterceptor, each of which keeps 
performing failover several times. This leads to a big spike of getSubCluster 
call to FederationStateStore. 

Depending on the cluster setup (e.g. putting a VIP before all YarnRMs), YarnRM 
master slave change might not result in RM addr change. In other cases, a small 
delay of getting latest subcluster information may be acceptable. This patch 
thus creates a config option, so that it is possible to ask the 
FederationRMFailoverProxy to not flush cache when calling getSubCluster(). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to