[ 
https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537504#comment-16537504
 ] 

Hudson commented on YARN-7899:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14545 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14545/])
YARN-7899. [AMRMProxy] Stateful FederationInterceptor for pending (gifuma: rev 
ea9b608237e7f2cf9b1e36b0f78c9674ec84096f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMRMClientRelayer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/AMRMClientUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java


> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---------------------------------------------------------------
>
>                 Key: YARN-7899
>                 URL: https://issues.apache.org/jira/browse/YARN-7899
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Major
>              Labels: amrmproxy, federation
>             Fix For: 3.2.0
>
>         Attachments: YARN-7899.v1.patch, YARN-7899.v3.patch
>
>
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless 
> in terms of pending (outstanding) requests. Whenever AM issues new requests, 
> FI simply splits and sends them to sub-cluster YarnRMs and forget about them. 
> This JIRA attempts to make FI stateful so that it remembers the pending 
> requests in all relevant sub-clusters. This has two major benefits: 
> 1. It is a prerequisite for FI to be able to cancel pending request in one 
> sub-cluster and re-send it to other sub-clusters. This is needed for load 
> balancing and to fully comply with the relax locality fallback to ANY 
> semantic. When we send a request to one sub-cluster, we have effectively 
> restrained the allocation for this request to be within this sub-cluster 
> rather than everywhere. If the cluster capacity in this sub-cluster for this 
> app is full or this YarnRM is overloaded and slow, the request will be stuck 
> there for a long time even if there is free capacity in other sub-clusters. 
> We need FI to remember and adjust the pending requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today 
> whenever one sub-cluster RM fails over, in order to recover lost pending 
> requests for this sub-cluster, 
> we have to propagate the ApplicationMasterNotRegisteredException from the 
> YarnRM back to AM, triggering a full pending resend from AM. This contains 
> pending for not only the failing-over sub-cluster, but everyone. Since our 
> split-merge (AMRMProxyPolicy) does not guarantee idempotency, the same 
> request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If 
> both these YarnRMs have not failed over, they will both allocate for this 
> request, leading to over-allocation. Also, these full pending resends also 
> puts unnecessary load on every YarnRM in the cluster everytime one YarnRM 
> fails over. With stateful FederationInterceptor, since we remember pending 
> requests we have sent out earlier, we can shield the 
> ApplicationMasterNotRegisteredException for AM and resend the pending only to 
> the failed over YarnRM. This eliminates over-allocation and minimizes the 
> recovery overhead. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to