Botong Huang created YARN-8933:
----------------------------------

             Summary: [AMRMProxy] Fix potential null AvailableResource and 
NumClusterNode in allocation response
                 Key: YARN-8933
                 URL: https://issues.apache.org/jira/browse/YARN-8933
             Project: Hadoop YARN
          Issue Type: Task
            Reporter: Botong Huang
            Assignee: Botong Huang


After YARN-8696, the allocate response by FederationInterceptor is merged from 
the responses from a random subset of all sub-clusters, depending on the async 
heartbeat timing. As a result, cluster-wide information fields in the response, 
e.g. AvailableResources and NumClusterNodes, are not consistent at all. It can 
even be null/zero because the specific response is merged from an empty set of 
sub-cluster responses. 

In this patch, we let FederationInterceptor remember the last allocate response 
from all known sub-clusters, and always construct the cluster-wide info fields 
from all of them. We also moved sub-cluster timeout from 
LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that sub-clusters 
that expired (haven't had a successful allocate response for a while) won't be 
included in the computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to