Hi Alexandru Can you share whats value in capacity scheduler for
yarn.scheduler.capacity.am.failure.scheduling.delay.ms Regards -Sanjeev On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar < [email protected]> wrote: > Hello, > > > > I’m running Hadoop 2.6 and I have encountered a problem with the > resourcemanager. After a restart the resourcemanager refuses to start with > the following error: > > > > 2015-06-26 08:54:10,342 INFO attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:recover(796)) - Recovering attempt: > appattempt_1435159945366_0792_000001 with final state: null > > 2015-06-26 08:54:10,342 INFO security.AMRMTokenSecretManager > (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken > for ApplicationAttempt: appattempt_1435159945366_0792_000001 > > 2015-06-26 08:54:10,342 INFO security.AMRMTokenSecretManager > (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for > appattempt_1435159945366_0792_000001 > > 2015-06-26 08:54:10,343 INFO resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app > attempt : appattempt_1435159945366_0792_000001 > > 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(594)) - Failed to load/recover state > > java.lang.NullPointerException > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590) > > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091) > > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226) > > 2015-06-26 08:54:10,348 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed > in state STARTED; cause: java.lang.NullPointerException > > java.lang.NullPointerException > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089) > > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312) > > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590) > > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091) > > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226) > > 2015-06-26 08:54:10,350 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics > system... > > 2015-06-26 08:54:10,417 INFO impl.MetricsSinkAdapter > (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread > interrupted. > > 2015-06-26 08:54:10,419 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > > 2015-06-26 08:54:10,420 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system > shutdown complete. > > 2015-06-26 08:54:10,437 INFO zookeeper.ZooKeeper > (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed > > 2015-06-26 08:54:10,437 INFO event.AsyncDispatcher > (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to > stop, igonring any new events. > > 2015-06-26 08:54:10,437 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(512)) - EventThread shut down > > 2015-06-26 08:54:10,439 INFO event.AsyncDispatcher > (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to > stop, igonring any new events. > > 2015-06-26 08:54:10,439 INFO service.AbstractService > (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in > state STOPPED; cause: java.lang.NullPointerException > > java.lang.NullPointerException > > > > After some searching I’ve discovered that the > *yarn.resourcemanager.store.class > *property controls the state of the ResourceManager and my value is > *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore* > so I have the state in zookeeper. > > My question is, should I just remove *appattempt_1435159945366_0792_000001 > *(and any other attempts) from zookeeper in order to have my > resourcemanager up, or is there a way to make it skip specific attempts, or > maybe I could just recreate the state store form zero since I don’t kare > about the running application, and I waold just like to have the > ResourceManager service up. > > > > Thank you, > > Alex > -- _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
