[
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
fanshilun updated YARN-6667:
----------------------------
Description:
>From the actual situation, the probability of this happening is very low.
It can only be caused by the master-slave fail-hover of YARN and the wrong
Epoch parameter configuration.
We will try to be compatible with this situation and let the Application run as
much as possible, using the following measures:
1. Select a node whose heartbeat does not time out for allocation, and at the
same time require the node to be in the RUNNING state.
2. If the heartbeat of both RMs does not time out, and both are in the RUNNING
state, select the previously allocated RM for Container processing.
> Handle containerId duplicate without failing the heartbeat in Federation
> Interceptor
> ------------------------------------------------------------------------------------
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Botong Huang
> Assignee: Botong Huang
> Priority: Minor
>
> From the actual situation, the probability of this happening is very low.
> It can only be caused by the master-slave fail-hover of YARN and the wrong
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the
> RUNNING state, select the previously allocated RM for Container processing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]