Karthik Kambatla created YARN-3811: -------------------------------------- Summary: NM restarts could lead to app failures Key: YARN-3811 URL: https://issues.apache.org/jira/browse/YARN-3811 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical
Consider the following scenario: 1. RM assigns a container on node N to an app A. 2. Node N is restarted 3. A tries to launch container on node N. 3 could lead to an NMNotYetReadyException depending on whether NM N has registered with the RM. In MR, this is considered a task attempt failure. A few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)