[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335321#comment-14335321 ] Jian He commented on YARN-3202: --- this piece of code is legacy code only for non-work-preserving restart. The existing code path for work-preserving restart covers this already. Given that we only support work-preserving restart, I think we can get rid of all the conditional code for non-work-preserving restart and the tests may need to be changed too. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335244#comment-14335244 ] Anubhav Dhoot commented on YARN-3202: - This seems fair to me. [~jianhe] do you see any reason handling completed master containers would interfere with work preserving recovery? Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336059#comment-14336059 ] Jian He commented on YARN-3202: --- To clarify: the ContainerRecoveredTransition in RMContainerImpl does that. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336062#comment-14336062 ] Rohith commented on YARN-3202: -- bq. as for work-preserving restart, master container completed event will be sent too. I agree it is sending after yarn-3194 and issue is not ocurring now. Before yarn-3194, since NMContainerStatus were not handled , RMAppAttempt always wait for container-expiry to trigger for master container in RUNNING state. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336071#comment-14336071 ] Jian He commented on YARN-3202: --- For RM work-preserving restart, even before YARN-3194, the ContainerRecoveredTransition handles this correctly. The patch will cause duplicate master container completed events sent. did I miss something ? Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336082#comment-14336082 ] Rohith commented on YARN-3202: -- I mean say RM is enabled with work-preservin-restart, but RM is not restarted. Only NM is restarted which sends recovered container status while registering.NM restart scenario was causing problem ealier if master container status was COMPLETED. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335977#comment-14335977 ] Rohith commented on YARN-3202: -- Thanks anybhav Dhoot and Jian He for pinching in!! After YARN-3194, this issue description scenario works as expected i.e master contaner resource will be released immediately. And I believe non-work-preserving restart should be supported as well otherwise users who are using non-work-preserving mode get impacted. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1407#comment-1407 ] Hadoop QA commented on YARN-3202: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700181/0001-YARN-3202.patch against trunk revision fe7a302. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6697//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6697//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6697//console This message is automatically generated. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled
[ https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333253#comment-14333253 ] Rohith commented on YARN-3202: -- Kindly review the patch, the patch is verified mannually deploying in cluster since tests is not added. Improve master container resource release time ICO work preserving restart enabled -- Key: YARN-3202 URL: https://issues.apache.org/jira/browse/YARN-3202 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3202.patch While NM is registering with RM , If NM sends completed_container for masterContainer then immediately resources of master container are released by triggering the CONTAINER_FINISHED event. This releases all the resources held by master container and allocated for other pending resource requests by applications. But ICO rm work preserving restart is enabled, if master container state is completed then the attempt is not move to FINISHING as long as container expiry triggered by container livelyness monitor. I think in the below code, need not check for work preserving restart enable so that immediately master container resources get released and allocated to other pending resource requests of different applications {code} // Handle received container status, this should be processed after new // RMNode inserted if (!rmContext.isWorkPreservingRecoveryEnabled()) { if (!request.getNMContainerStatuses().isEmpty()) { LOG.info(received container statuses on node manager register : + request.getNMContainerStatuses()); for (NMContainerStatus status : request.getNMContainerStatuses()) { handleNMContainerStatus(status, nodeId); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)