[jira] [Updated] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

fanshilun (Jira) Thu, 25 Aug 2022 19:33:07 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


fanshilun updated YARN-6667:
----------------------------
    Description: 
>From the actual situation, the probability of this happening is very low. 
It can only be caused by the master-slave fail-hover of YARN and the wrong 
Epoch parameter configuration.

We will try to be compatible with this situation and let the Application run as 
much as possible, using the following measures:
1. Select a node whose heartbeat does not time out for allocation, and at the 
same time require the node to be in the RUNNING state.
2. If the heartbeat of both RMs does not time out, and both are in the RUNNING 
state, select the previously allocated RM for Container processing.

> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-6667
>                 URL: https://issues.apache.org/jira/browse/YARN-6667
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

Reply via email to