[
https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Botong Huang updated YARN-8933:
-------------------------------
Component/s: federation
amrmproxy
> [AMRMProxy] Fix potential null AvailableResource and NumClusterNode in
> allocation response
> ------------------------------------------------------------------------------------------
>
> Key: YARN-8933
> URL: https://issues.apache.org/jira/browse/YARN-8933
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: amrmproxy, federation
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Major
>
> After YARN-8696, the allocate response by FederationInterceptor is merged
> from the responses from a random subset of all sub-clusters, depending on the
> async heartbeat timing. As a result, cluster-wide information fields in the
> response, e.g. AvailableResources and NumClusterNodes, are not consistent at
> all. It can even be null/zero because the specific response is merged from an
> empty set of sub-cluster responses.
> In this patch, we let FederationInterceptor remember the last allocate
> response from all known sub-clusters, and always construct the cluster-wide
> info fields from all of them. We also moved sub-cluster timeout from
> LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that
> sub-clusters that expired (haven't had a successful allocate response for a
> while) won't be included in the computation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]