[ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998631#comment-13998631 ]
Tsuyoshi OZAWA commented on YARN-1367: -------------------------------------- Some comments against a patch: 1. Can you fix the indent? {code} + public boolean isWorkPreservingRestartEnabled() { return + isWorkPreservingRestartEnabled; + } {code} {code} + if (!rmWorkPreservingRestartEnbaled) + { + containerManager.cleanupContainersOnNMResync(); + } {code} 2. IMO, "recovery.work-preserving-restart.enabled" is more appropriate because this is one of options under RECOVERY_ENABLED namespace. {code} public static final String RM_WORK_PRESERVING_RECOVERY_ENABLED = RM_PREFIX + "work-preserving.recovery.enabled"; {code} > After restart NM should resync with the RM without killing containers > --------------------------------------------------------------------- > > Key: YARN-1367 > URL: https://issues.apache.org/jira/browse/YARN-1367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Anubhav Dhoot > Attachments: YARN-1367.prototype.patch > > > After RM restart, the RM sends a resync response to NMs that heartbeat to it. > Upon receiving the resync response, the NM kills all containers and > re-registers with the RM. The NM should be changed to not kill the container > and instead inform the RM about all currently running containers including > their allocations etc. After the re-register, the NM should send all pending > container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.2#6252)