Botong Huang created YARN-8933:
----------------------------------
Summary: [AMRMProxy] Fix potential null AvailableResource and
NumClusterNode in allocation response
Key: YARN-8933
URL: https://issues.apache.org/jira/browse/YARN-8933
Project: Hadoop YARN
Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang
After YARN-8696, the allocate response by FederationInterceptor is merged from
the responses from a random subset of all sub-clusters, depending on the async
heartbeat timing. As a result, cluster-wide information fields in the response,
e.g. AvailableResources and NumClusterNodes, are not consistent at all. It can
even be null/zero because the specific response is merged from an empty set of
sub-cluster responses.
In this patch, we let FederationInterceptor remember the last allocate response
from all known sub-clusters, and always construct the cluster-wide info fields
from all of them. We also moved sub-cluster timeout from
LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that sub-clusters
that expired (haven't had a successful allocate response for a while) won't be
included in the computation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]