[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335321#comment-14335321
 ] 

Jian He commented on YARN-3202:
---

this piece of code is legacy code only for non-work-preserving restart.  The 
existing code path for work-preserving restart covers this already. 
Given that we only support work-preserving restart, I think we can get rid of 
all the conditional code for non-work-preserving restart and the tests may need 
to be changed too.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335244#comment-14335244
 ] 

Anubhav Dhoot commented on YARN-3202:
-

This seems fair to me. [~jianhe] do you see any reason handling completed 
master containers would interfere with work preserving recovery?

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336059#comment-14336059
 ] 

Jian He commented on YARN-3202:
---

To clarify: the ContainerRecoveredTransition in RMContainerImpl does that. 

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336062#comment-14336062
 ] 

Rohith commented on YARN-3202:
--

bq. as for work-preserving restart, master container completed event will be 
sent too.
I agree it is sending after yarn-3194 and issue is not ocurring now. Before 
yarn-3194, since NMContainerStatus were not handled , RMAppAttempt always wait 
for container-expiry to trigger for master container in RUNNING state. 

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336071#comment-14336071
 ] 

Jian He commented on YARN-3202:
---

For RM work-preserving restart, even before YARN-3194, the 
ContainerRecoveredTransition handles this correctly.  The patch will cause 
duplicate master container completed events sent. did I miss something ?

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336082#comment-14336082
 ] 

Rohith commented on YARN-3202:
--

I mean say RM is enabled with work-preservin-restart, but RM is not restarted. 
Only NM is restarted which sends recovered container status while 
registering.NM restart scenario was causing problem ealier if master container 
status was COMPLETED.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-24 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335977#comment-14335977
 ] 

Rohith commented on YARN-3202:
--

Thanks anybhav Dhoot and Jian He for pinching in!!
After YARN-3194, this issue description scenario works as expected i.e master 
contaner resource will be released immediately. And I believe 
non-work-preserving restart should be supported as well otherwise users who are 
using non-work-preserving mode get impacted.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1407#comment-1407
 ] 

Hadoop QA commented on YARN-3202:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700181/0001-YARN-3202.patch
  against trunk revision fe7a302.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6697//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6697//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6697//console

This message is automatically generated.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3202) Improve master container resource release time ICO work preserving restart enabled

2015-02-23 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333253#comment-14333253
 ] 

Rohith commented on YARN-3202:
--

Kindly review the patch, the patch is verified mannually deploying in cluster 
since tests is not added.

 Improve master container resource release time ICO work preserving restart 
 enabled
 --

 Key: YARN-3202
 URL: https://issues.apache.org/jira/browse/YARN-3202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3202.patch


 While NM is registering with RM , If NM sends completed_container for 
 masterContainer then immediately resources of master container are released 
 by triggering the CONTAINER_FINISHED event. This releases all the resources 
 held by master container and allocated for other pending resource requests by 
 applications.
 But ICO rm work preserving restart is enabled, if master container state is 
 completed then the attempt is not move to FINISHING as long as container 
 expiry triggered by container livelyness monitor. I think in the below code, 
 need not check for work preserving restart enable so that immediately master 
 container resources get released and allocated to other pending resource 
 requests of different applications
 {code}
 // Handle received container status, this should be processed after new
 // RMNode inserted
 if (!rmContext.isWorkPreservingRecoveryEnabled()) {
   if (!request.getNMContainerStatuses().isEmpty()) {
 LOG.info(received container statuses on node manager register :
 + request.getNMContainerStatuses());
 for (NMContainerStatus status : request.getNMContainerStatuses()) {
   handleNMContainerStatus(status, nodeId);
 }
   }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)