[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806654#comment-15806654
 ] 

Hudson commented on YARN-6009:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11083 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11083/])
YARN-6009. Skip validating app timeout value on recovery. Contributed by 
(jianhe: rev 020316458dfe6059b700f8d93a9791e4cb817b3f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java


> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805374#comment-15805374
 ] 

Daniel Templeton commented on YARN-6009:


Thanks, [~rohithsharma].

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805370#comment-15805370
 ] 

Rohith Sharma K S commented on YARN-6009:
-

Client were allowed to configure 0 but server was ignoring. But when user 
configure any timeout, then it has to be considered or throw an error if 
invalid. So, validation fails letting user knows about configured invalid. 

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805323#comment-15805323
 ] 

Daniel Templeton commented on YARN-6009:


Any idea why YARN-5611 made a 0 timeout illegal if the meaning of 0 timeout has 
not changed?

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805302#comment-15805302
 ] 

Jian He commented on YARN-6009:
---

makes sense to me, I'll commit later today if no more comments

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800249#comment-15800249
 ] 

Rohith Sharma K S commented on YARN-6009:
-

bq. What is the impact of allowing an app to be recovered with a 0 timeout 
value?
No impact, just by passing to make service up. 

bq. Is that going to cause the recovered app to immediately expire?
Noting happens to app. App will NOT be monitored for timeout. Moreover, App 
with 0 timeout was not monitored before RM restart also.

bq. Also, you may as well do the validation before the queue mapping.
Both queue mapping and timeout check are pre-validation process. The order of 
preference should not matter.

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799419#comment-15799419
 ] 

Daniel Templeton commented on YARN-6009:


What is the impact of allowing an app to be recovered with a 0 timeout value?  
Looks like YARN-5611 changes the timeout to be absolute rather than relative.  
Is that going to cause the recovered app to immediately expire?  Also, you may 
as well do the validation before the queue mapping.

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799125#comment-15799125
 ] 

Hadoop QA commented on YARN-6009:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m  4s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m  4s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6009 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12845480/YARN-6009.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0083599930d9 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a0a2761 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14551/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14551/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14551/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799037#comment-15799037
 ] 

Rohith Sharma K S commented on YARN-6009:
-

There is open JIRA for test failure i.e YARN-5548. Test failure is unrelated to 
patch. 

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798965#comment-15798965
 ] 

Jian He commented on YARN-6009:
---

patch looks good to me. The UT failure passed locally for me.
Retry the jenkins

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15797525#comment-15797525
 ] 

Hadoop QA commented on YARN-6009:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6009 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12845480/YARN-6009.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux de4c16810a9b 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e49e0a6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14546/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14546/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14546/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-03 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796868#comment-15796868
 ] 

Gour Saha commented on YARN-6009:
-

[~rohithsharma] I understand that, but I am a little worried here. No matter 
what the issue with the state store of a particular app may be, it should not 
block the RM from starting. Note, this is not just limited to lifetime 
property. We can log appropriate messages for the problematic apps (and maybe 
even update the app diagnostics) and move on with graceful start of RM. The app 
owners can later work on the individual problematic apps, but at least the 
cluster will be up and running, ready to serve new apps.

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2016-12-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766144#comment-15766144
 ] 

Rohith Sharma K S commented on YARN-6009:
-

More precisely, 
# Applications were allowed to submit with zero timeout configured for any 
ApplicationTimeoutTypes. But now, is is mandatory that timeout value should be 
greater than zero. This validation happens at server side in 
RMServerUtils.validateApplicationTimeouts.
# Earlier, user input timeout value is directly stored in RMStateStore say 
timeout=10seconds. Now, this value will be changed to absolute 
time(currentTimeInMillis+10seconds)

the point one is causing failure during upgrade from YARN-4205.

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2016-12-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765547#comment-15765547
 ] 

Jian He commented on YARN-6009:
---

[~rohithsharma], could you clarify which code logic changed?

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(599)) - 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2016-12-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756428#comment-15756428
 ] 

Rohith Sharma K S commented on YARN-6009:
-

[~gsaha]
This is mainly because cluster was running YARN-4205 patch initially. And 
trying to upgrade with other timeout patches cluster. 

But in other timeout patch I think YARN-5611, the design of timeout value that 
will be stored in RMStateStore and validation during application submission are 
got changed. 
This is causing the recovery failure for upgrade from YARN-4205 patch. So, any 
upgrade from YARN-4205 patch to latest do not work properly. 

With YARN-4205+YARN-5611 cluster, this issue does not appear. 

To move forward in your cluster to make cluster up, This application need to be 
deleted from state store using CLI {{./yarn resourcemanager 
-remove-application-from-state-store $appId}} so that RM service will be up. 


> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> 

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2016-12-16 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755546#comment-15755546
 ] 

Gour Saha commented on YARN-6009:
-

/cc [~rohithsharma]

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(599)) - Failed to load/recover state
>