[jira] [Commented] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests

2022-12-27 Thread Botong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652348#comment-17652348
 ] 

Botong Huang commented on YARN-7899:


[~walhl] "cancel pending request in one sub-cluster and re-send it to other 
sub-clusters" this is not done yet. 

> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---
>
> Key: YARN-7899
> URL: https://issues.apache.org/jira/browse/YARN-7899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>  Labels: amrmproxy, federation
> Fix For: 3.2.0
>
> Attachments: YARN-7899-branch-2.v3.patch, YARN-7899.v1.patch, 
> YARN-7899.v3.patch
>
>
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless 
> in terms of pending (outstanding) requests. Whenever AM issues new requests, 
> FI simply splits and sends them to sub-cluster YarnRMs and forget about them. 
> This JIRA attempts to make FI stateful so that it remembers the pending 
> requests in all relevant sub-clusters. This has two major benefits: 
> 1. It is a prerequisite for FI to be able to cancel pending request in one 
> sub-cluster and re-send it to other sub-clusters. This is needed for load 
> balancing and to fully comply with the relax locality fallback to ANY 
> semantic. When we send a request to one sub-cluster, we have effectively 
> restrained the allocation for this request to be within this sub-cluster 
> rather than everywhere. If the cluster capacity in this sub-cluster for this 
> app is full or this YarnRM is overloaded and slow, the request will be stuck 
> there for a long time even if there is free capacity in other sub-clusters. 
> We need FI to remember and adjust the pending requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today 
> whenever one sub-cluster RM fails over, in order to recover lost pending 
> requests for this sub-cluster, 
> we have to propagate the ApplicationMasterNotRegisteredException from the 
> YarnRM back to AM, triggering a full pending resend from AM. This contains 
> pending for not only the failing-over sub-cluster, but everyone. Since our 
> split-merge (AMRMProxyPolicy) does not guarantee idempotency, the same 
> request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If 
> both these YarnRMs have not failed over, they will both allocate for this 
> request, leading to over-allocation. Also, these full pending resends also 
> puts unnecessary load on every YarnRM in the cluster everytime one YarnRM 
> fails over. With stateful FederationInterceptor, since we remember pending 
> requests we have sent out earlier, we can shield the 
> ApplicationMasterNotRegisteredException for AM and resend the pending only to 
> the failed over YarnRM. This eliminates over-allocation and minimizes the 
> recovery overhead. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11392) ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods but forget to use sometimes, caused audit log missing.

2022-12-27 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved YARN-11392.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2. [~chino71], thank 
you for the contribution.

> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods 
> but forget to use sometimes, caused audit log missing.
> 
>
> Key: YARN-11392
> URL: https://issues.apache.org/jira/browse/YARN-11392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: Beibei Zhao
>Assignee: Beibei Zhao
>Priority: Major
>  Labels: audit, log, pull-request-available, yarn
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods.
> {code:java}
> private UserGroupInformation getCallerUgi(ApplicationId applicationId,
>   String operation) throws YarnException {
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>   RMAuditLogger.logFailure("UNKNOWN", operation, "UNKNOWN",
>   "ClientRMService", "Error getting UGI", applicationId);
>   throw RPCUtil.getRemoteException(ie);
> }
> return callerUGI;
>   }
> {code}
> *Privileged operations* like "getContainerReport" (which called checkAccess 
> before op) will call them and *record audit logs* when an *exception* 
> happens, but forget to use sometimes, caused audit log {*}missing{*}: 
> {code:java}
> // getApplicationReport
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>      // a logFailure should be called here. 
>      throw RPCUtil.getRemoteException(ie);
> }
> {code}
> So, I will replace some code blocks like this with getCallerUgi or 
> verifyUserAccessForRMApp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11392) ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods but forget to use sometimes, caused audit log missing.

2022-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652333#comment-17652333
 ] 

ASF GitHub Bot commented on YARN-11392:
---

cnauroth commented on PR #5250:
URL: https://github.com/apache/hadoop/pull/5250#issuecomment-1366276967

   I have committed this to trunk, branch-3.3 and branch-3.2. @curie71 , thank 
you for the contribution.




> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods 
> but forget to use sometimes, caused audit log missing.
> 
>
> Key: YARN-11392
> URL: https://issues.apache.org/jira/browse/YARN-11392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: Beibei Zhao
>Assignee: Beibei Zhao
>Priority: Major
>  Labels: audit, log, pull-request-available, yarn
>
> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods.
> {code:java}
> private UserGroupInformation getCallerUgi(ApplicationId applicationId,
>   String operation) throws YarnException {
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>   RMAuditLogger.logFailure("UNKNOWN", operation, "UNKNOWN",
>   "ClientRMService", "Error getting UGI", applicationId);
>   throw RPCUtil.getRemoteException(ie);
> }
> return callerUGI;
>   }
> {code}
> *Privileged operations* like "getContainerReport" (which called checkAccess 
> before op) will call them and *record audit logs* when an *exception* 
> happens, but forget to use sometimes, caused audit log {*}missing{*}: 
> {code:java}
> // getApplicationReport
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>      // a logFailure should be called here. 
>      throw RPCUtil.getRemoteException(ie);
> }
> {code}
> So, I will replace some code blocks like this with getCallerUgi or 
> verifyUserAccessForRMApp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11392) ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods but forget to use sometimes, caused audit log missing.

2022-12-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652329#comment-17652329
 ] 

ASF GitHub Bot commented on YARN-11392:
---

cnauroth merged PR #5250:
URL: https://github.com/apache/hadoop/pull/5250




> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods 
> but forget to use sometimes, caused audit log missing.
> 
>
> Key: YARN-11392
> URL: https://issues.apache.org/jira/browse/YARN-11392
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: Beibei Zhao
>Assignee: Beibei Zhao
>Priority: Major
>  Labels: audit, log, pull-request-available, yarn
>
> ClientRMService implemented getCallerUgi and verifyUserAccessForRMApp methods.
> {code:java}
> private UserGroupInformation getCallerUgi(ApplicationId applicationId,
>   String operation) throws YarnException {
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>   RMAuditLogger.logFailure("UNKNOWN", operation, "UNKNOWN",
>   "ClientRMService", "Error getting UGI", applicationId);
>   throw RPCUtil.getRemoteException(ie);
> }
> return callerUGI;
>   }
> {code}
> *Privileged operations* like "getContainerReport" (which called checkAccess 
> before op) will call them and *record audit logs* when an *exception* 
> happens, but forget to use sometimes, caused audit log {*}missing{*}: 
> {code:java}
> // getApplicationReport
> UserGroupInformation callerUGI;
> try {
>   callerUGI = UserGroupInformation.getCurrentUser();
> } catch (IOException ie) {
>   LOG.info("Error getting UGI ", ie);
>      // a logFailure should be called here. 
>      throw RPCUtil.getRemoteException(ie);
> }
> {code}
> So, I will replace some code blocks like this with getCallerUgi or 
> verifyUserAccessForRMApp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11403) Decommission Node reduces the maximumAllocation and leads to Job Failure

2022-12-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-11403:
-
Description: 
When a node is put into Decommission, ClusterNodeTracker updates the 
maximumAllocation to the totalResources in use from that node. This could lead 
to Job Failure (with below error message) when the Job requests for a container 
of size greater than the new maximumAllocation.
{code:java}
22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in a 
row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocation. Requested resource type=[vcores], Requested 
resource=, 
maximum allowed allocation=, please note that maximum 
allowed allocation is calculated by scheduler based on maximum resource of 
registered NodeManagers, which might be less than configured maximum 
allocation=
{code}
*Repro:*

1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
Resource Memory 10GB and configured maxAllocation is 10GB.
2. Submit SparkPi Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
ApplicationMaster (2GB) is launched on node1. 
3. Put both nodes into Decommission. This makes maxAllocation to come down to 
2GB.
4. The SparkPi Job fails as it requests for Executor Size of 4GB whereas 
maxAllocation is only 2GB.

  was:
When a node is put into Decommission, ClusterNodeTracker updates the 
maximumAllocation to the totalResources in use from that node. This could lead 
to Job Failure (with below error message) when the Job requests for a container 
of size greater than the new maximumAllocation.

{code}
22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in a 
row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocation. Requested resource type=[vcores], Requested 
resource=, 
maximum allowed allocation=, please note that maximum 
allowed allocation is calculated by scheduler based on maximum resource of 
registered NodeManagers, which might be less than configured maximum 
allocation=
{code}

*Repro:*

1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
Resource Memory 10GB and configured maxAllocation is 10GB.
2. Submit Spark Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
ApplicationMaster (2GB) is launched on node1. (Add a wait condition in Spark 
before it requests for Executors)
3. Put node1 into Decommission and make node2 into UNHEALTHY. This makes 
maxAllocation to come down to 2GB.
4. Now notify the Spark Job. It requests for 4GB executor Size but the new 
maxAllocation is 2GB and so will fail.







> Decommission Node reduces the maximumAllocation and leads to Job Failure
> 
>
> Key: YARN-11403
> URL: https://issues.apache.org/jira/browse/YARN-11403
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When a node is put into Decommission, ClusterNodeTracker updates the 
> maximumAllocation to the totalResources in use from that node. This could 
> lead to Job Failure (with below error message) when the Job requests for a 
> container of size greater than the new maximumAllocation.
> {code:java}
> 22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in 
> a row.
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[vcores], Requested 
> resource= vCores:2147483647>, maximum allowed allocation=, please 
> note that maximum allowed allocation is calculated by scheduler based on 
> maximum resource of registered NodeManagers, which might be less than 
> configured maximum allocation=
> {code}
> *Repro:*
> 1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
> Resource Memory 10GB and configured maxAllocation is 10GB.
> 2. Submit SparkPi Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
> ApplicationMaster (2GB) is launched on node1. 
> 3. Put both nodes into Decommission. This makes maxAllocation to come down to 
> 2GB.
> 4. The SparkPi Job fails as it requests for Executor Size of 4GB whereas 
> maxAllocation is only 2GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@had

[jira] [Updated] (YARN-11403) Decommission Node reduces the maximumAllocation and leads to Job Failure

2022-12-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-11403:
-
Description: 
When a node is put into Decommission, ClusterNodeTracker updates the 
maximumAllocation to the totalResources in use from that node. This could lead 
to Job Failure (with below error message) when the Job requests for a container 
of size greater than the new maximumAllocation.

{code}
22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in a 
row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocation. Requested resource type=[vcores], Requested 
resource=, 
maximum allowed allocation=, please note that maximum 
allowed allocation is calculated by scheduler based on maximum resource of 
registered NodeManagers, which might be less than configured maximum 
allocation=
{code}

*Repro:*

1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
Resource Memory 10GB and configured maxAllocation is 10GB.
2. Submit Spark Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
ApplicationMaster (2GB) is launched on node1. (Add a wait condition in Spark 
before it requests for Executors)
3. Put node1 into Decommission and make node2 into UNHEALTHY. This makes 
maxAllocation to come down to 2GB.
4. Now notify the Spark Job. It requests for 4GB executor Size but the new 
maxAllocation is 2GB and so will fail.






  was:
When a node is put into Decommission, ClusterNodeTracker updates the 
maximumAllocation to the totalResources in use from that node. This could lead 
to Job Failure (with below error message) when the Job requests for a container 
of size greater than the new maximumAllocation.

{code}
22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in a 
row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocation. Requested resource type=[vcores], Requested 
resource=, 
maximum allowed allocation=, please note that maximum 
allowed allocation is calculated by scheduler based on maximum resource of 
registered NodeManagers, which might be less than configured maximum 
allocation=
{code}

**Repro:**

1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
Resource Memory 10GB and configured maxAllocation is 10GB.
2. Submit Spark Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
ApplicationMaster (2GB) is launched on node1. (Add a wait condition in Spark 
before it requests for Executors)
3. Put node1 into Decommission and make node2 into UNHEALTHY. This makes 
maxAllocation to come down to 2GB.
4. Now notify the Spark Job. It requests for 4GB executor Size but the new 
maxAllocation is 2GB and so will fail.







> Decommission Node reduces the maximumAllocation and leads to Job Failure
> 
>
> Key: YARN-11403
> URL: https://issues.apache.org/jira/browse/YARN-11403
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When a node is put into Decommission, ClusterNodeTracker updates the 
> maximumAllocation to the totalResources in use from that node. This could 
> lead to Job Failure (with below error message) when the Job requests for a 
> container of size greater than the new maximumAllocation.
> {code}
> 22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in 
> a row.
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request! Cannot allocate containers as requested resource is greater 
> than maximum allowed allocation. Requested resource type=[vcores], Requested 
> resource= vCores:2147483647>, maximum allowed allocation=, please 
> note that maximum allowed allocation is calculated by scheduler based on 
> maximum resource of registered NodeManagers, which might be less than 
> configured maximum allocation=
> {code}
> *Repro:*
> 1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
> Resource Memory 10GB and configured maxAllocation is 10GB.
> 2. Submit Spark Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
> ApplicationMaster (2GB) is launched on node1. (Add a wait condition in Spark 
> before it requests for Executors)
> 3. Put node1 into Decommission and make node2 into UNHEALTHY. This makes 
> maxAllocation to come down to 2GB.
> 4. Now notify the Spark Job. It requests for 4GB executor Size but the new 
> maxAllocation is 2GB and so will fail.



--
This message was sent by Atlassian Jira
(v8

[jira] [Created] (YARN-11403) Decommission Node reduces the maximumAllocation and leads to Job Failure

2022-12-27 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-11403:


 Summary: Decommission Node reduces the maximumAllocation and leads 
to Job Failure
 Key: YARN-11403
 URL: https://issues.apache.org/jira/browse/YARN-11403
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.4
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


When a node is put into Decommission, ClusterNodeTracker updates the 
maximumAllocation to the totalResources in use from that node. This could lead 
to Job Failure (with below error message) when the Job requests for a container 
of size greater than the new maximumAllocation.

{code}
22/11/03 10:55:02 WARN ApplicationMaster: Reporter thread fails 4 time(s) in a 
row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request! Cannot allocate containers as requested resource is greater 
than maximum allowed allocation. Requested resource type=[vcores], Requested 
resource=, 
maximum allowed allocation=, please note that maximum 
allowed allocation is calculated by scheduler based on maximum resource of 
registered NodeManagers, which might be less than configured maximum 
allocation=
{code}

**Repro:**

1. Cluster with two worker nodes - node1 and node2 each with YARN NodeManager 
Resource Memory 10GB and configured maxAllocation is 10GB.
2. Submit Spark Job (ApplicationMaster Size: 2GB, Executor Size: 4GB). Say 
ApplicationMaster (2GB) is launched on node1. (Add a wait condition in Spark 
before it requests for Executors)
3. Put node1 into Decommission and make node2 into UNHEALTHY. This makes 
maxAllocation to come down to 2GB.
4. Now notify the Spark Job. It requests for 4GB executor Size but the new 
maxAllocation is 2GB and so will fail.








--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests

2022-12-27 Thread walhl.liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652269#comment-17652269
 ] 

walhl.liu edited comment on YARN-7899 at 12/27/22 3:17 PM:
---

[~botong]  thank you for your nice patch, I wonder if  you patch contains 
funcation ("cancel pending request in one sub-cluster and re-send it to other 
sub-clusters")


was (Author: walhl):
[~botong]  thank you for your nice patch, I wonder if  you patch contains 
funcation (""cancel pending request in one sub-cluster and re-send it to other 
sub-clusters"")

> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---
>
> Key: YARN-7899
> URL: https://issues.apache.org/jira/browse/YARN-7899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>  Labels: amrmproxy, federation
> Fix For: 3.2.0
>
> Attachments: YARN-7899-branch-2.v3.patch, YARN-7899.v1.patch, 
> YARN-7899.v3.patch
>
>
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless 
> in terms of pending (outstanding) requests. Whenever AM issues new requests, 
> FI simply splits and sends them to sub-cluster YarnRMs and forget about them. 
> This JIRA attempts to make FI stateful so that it remembers the pending 
> requests in all relevant sub-clusters. This has two major benefits: 
> 1. It is a prerequisite for FI to be able to cancel pending request in one 
> sub-cluster and re-send it to other sub-clusters. This is needed for load 
> balancing and to fully comply with the relax locality fallback to ANY 
> semantic. When we send a request to one sub-cluster, we have effectively 
> restrained the allocation for this request to be within this sub-cluster 
> rather than everywhere. If the cluster capacity in this sub-cluster for this 
> app is full or this YarnRM is overloaded and slow, the request will be stuck 
> there for a long time even if there is free capacity in other sub-clusters. 
> We need FI to remember and adjust the pending requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today 
> whenever one sub-cluster RM fails over, in order to recover lost pending 
> requests for this sub-cluster, 
> we have to propagate the ApplicationMasterNotRegisteredException from the 
> YarnRM back to AM, triggering a full pending resend from AM. This contains 
> pending for not only the failing-over sub-cluster, but everyone. Since our 
> split-merge (AMRMProxyPolicy) does not guarantee idempotency, the same 
> request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If 
> both these YarnRMs have not failed over, they will both allocate for this 
> request, leading to over-allocation. Also, these full pending resends also 
> puts unnecessary load on every YarnRM in the cluster everytime one YarnRM 
> fails over. With stateful FederationInterceptor, since we remember pending 
> requests we have sent out earlier, we can shield the 
> ApplicationMasterNotRegisteredException for AM and resend the pending only to 
> the failed over YarnRM. This eliminates over-allocation and minimizes the 
> recovery overhead. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7899) [AMRMProxy] Stateful FederationInterceptor for pending requests

2022-12-27 Thread walhl.liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652269#comment-17652269
 ] 

walhl.liu commented on YARN-7899:
-

[~botong]  thank you for your nice patch, I wonder if  you patch contains 
funcation (""cancel pending request in one sub-cluster and re-send it to other 
sub-clusters"")

> [AMRMProxy] Stateful FederationInterceptor for pending requests
> ---
>
> Key: YARN-7899
> URL: https://issues.apache.org/jira/browse/YARN-7899
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>  Labels: amrmproxy, federation
> Fix For: 3.2.0
>
> Attachments: YARN-7899-branch-2.v3.patch, YARN-7899.v1.patch, 
> YARN-7899.v3.patch
>
>
> Today FederationInterceptor (in AMRMProxy for YARN Federation) is stateless 
> in terms of pending (outstanding) requests. Whenever AM issues new requests, 
> FI simply splits and sends them to sub-cluster YarnRMs and forget about them. 
> This JIRA attempts to make FI stateful so that it remembers the pending 
> requests in all relevant sub-clusters. This has two major benefits: 
> 1. It is a prerequisite for FI to be able to cancel pending request in one 
> sub-cluster and re-send it to other sub-clusters. This is needed for load 
> balancing and to fully comply with the relax locality fallback to ANY 
> semantic. When we send a request to one sub-cluster, we have effectively 
> restrained the allocation for this request to be within this sub-cluster 
> rather than everywhere. If the cluster capacity in this sub-cluster for this 
> app is full or this YarnRM is overloaded and slow, the request will be stuck 
> there for a long time even if there is free capacity in other sub-clusters. 
> We need FI to remember and adjust the pending requests on the fly. 
> 2. This makes pending request recovery easier when YarnRM fails over. Today 
> whenever one sub-cluster RM fails over, in order to recover lost pending 
> requests for this sub-cluster, 
> we have to propagate the ApplicationMasterNotRegisteredException from the 
> YarnRM back to AM, triggering a full pending resend from AM. This contains 
> pending for not only the failing-over sub-cluster, but everyone. Since our 
> split-merge (AMRMProxyPolicy) does not guarantee idempotency, the same 
> request we sent to sub-cluster-1 earlier might be resent to sub-cluster-2. If 
> both these YarnRMs have not failed over, they will both allocate for this 
> request, leading to over-allocation. Also, these full pending resends also 
> puts unnecessary load on every YarnRM in the cluster everytime one YarnRM 
> fails over. With stateful FederationInterceptor, since we remember pending 
> requests we have sent out earlier, we can shield the 
> ApplicationMasterNotRegisteredException for AM and resend the pending only to 
> the failed over YarnRM. This eliminates over-allocation and minimizes the 
> recovery overhead. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org