[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497413#comment-14497413
 ] 

Rohith commented on YARN-3493:
------------------------------

The same problem would occur enabling RM work preserving restart where Running 
AM updates its ResourceRequest on RESYNC command from RM. This causes throw 
InvalidResourceRequestException to AM which AM do not expect it.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-3493
>                 URL: https://issues.apache.org/jira/browse/YARN-3493
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.0
>            Reporter: Sumana Sathish
>            Assignee: Jian He
>            Priority: Critical
>         Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,624 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in 
> state STARTED; cause: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,625 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(211)) - Stopping ResourceManager metrics 
> system...
> 2015-04-15 13:19:18,626 INFO  impl.MetricsSinkAdapter 
> (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread 
> interrupted.
> 2015-04-15 13:19:18,626 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(217)) - ResourceManager metrics system stopped.
> 2015-04-15 13:19:18,627 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(606)) - ResourceManager metrics system 
> shutdown complete.
> 2015-04-15 13:19:18,627 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to 
> stop, igonring any new events.
> 2015-04-15 13:19:18,633 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) 
> - Session: 0x44cbc922670001c closed
> 2015-04-15 13:19:18,633 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) 
> - EventThread shut down
> 2015-04-15 13:19:18,634 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to 
> stop, igonring any new events.
> 2015-04-15 13:19:18,634 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in state 
> STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:142)
>         at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:601)
>         at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
>         at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to