[ 
https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675623#comment-16675623
 ] 

Botong Huang commented on YARN-8933:
------------------------------------

Ah good catch, and thx for reviewing! 

> [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in 
> allocation response
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-8933
>                 URL: https://issues.apache.org/jira/browse/YARN-8933
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy, federation
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-8933.v1.patch, YARN-8933.v2.patch
>
>
> After YARN-8696, the allocate response by FederationInterceptor is merged 
> from the responses from a random subset of all sub-clusters, depending on the 
> async heartbeat timing. As a result, cluster-wide information fields in the 
> response, e.g. AvailableResources and NumClusterNodes, are not consistent at 
> all. It can even be null/zero because the specific response is merged from an 
> empty set of sub-cluster responses. 
> In this patch, we let FederationInterceptor remember the last allocate 
> response from all known sub-clusters, and always construct the cluster-wide 
> info fields from all of them. We also moved sub-cluster timeout from 
> LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that 
> sub-clusters that expired (haven't had a successful allocate response for a 
> while) won't be included in the computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to