Anubhav Dhoot created YARN-3229:
-----------------------------------

             Summary: Incorrect processing of container as LOST on Interruption 
during NM shutdown
                 Key: YARN-3229
                 URL: https://issues.apache.org/jira/browse/YARN-3229
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Anubhav Dhoot


YARN-2846 fixed the issue of writing to the state store incorrectly that the 
process is LOST. But even after that we still process the ContainerExitEvent. 
If notInterrupted is false in RecoveredContainerLaunch#call we should skip the 
following
{noformat}
 if (retCode != 0) {
      LOG.warn("Recovered container exited with a non-zero exit code "
          + retCode);
      this.dispatcher.getEventHandler().handle(new ContainerExitEvent(
          containerId,
          ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode,
          "Container exited with a non-zero exit code " + retCode));
      return retCode;
    }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to