[ 
https://issues.apache.org/jira/browse/YARN-9197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745606#comment-16745606
 ] 

Wangda Tan commented on YARN-9197:
----------------------------------

Thanks [~kyungwan nam] for filing and working on the patch.

+[~billie.rinaldi], [~eyang] could u help to review the patch? Haven't dig into 
details of the patch, when the state of ComponentInstanceEvent will be null and 
triggers the issue? Should we make the field name more specific / add more 
comments for easier maintenance?

> NPE in service AM when failed to launch container
> -------------------------------------------------
>
>                 Key: YARN-9197
>                 URL: https://issues.apache.org/jira/browse/YARN-9197
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: kyungwan nam
>            Assignee: kyungwan nam
>            Priority: Major
>         Attachments: YARN-9197.001.patch
>
>
> I’ve met NPE in service AM as follows.
> {code}
> 2019-01-02 22:35:47,582 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT regionserver]: Assigned container_e15_1542704944343_0001_01_000001 
> to component instance regionserver-1 and launch on host test2.com:45454 
> 2019-01-02 22:35:47,588 [pool-6-thread-5] WARN  ipc.Client - Exception 
> encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for yarn-ats: HDFS_DELEGATION_TOKEN owner=yarn-ats, 
> renewer=yarn, realUser=rm/test1.nfra...@example.com, issueDate=1542704946397, 
> maxDate=1543309746397, sequenceNumber=97, masterKeyId=90) can't be found in 
> cache
> 2019-01-02 22:35:47,592 [pool-6-thread-5] ERROR 
> containerlaunch.ContainerLaunchService - [COMPINSTANCE regionserver-1 : 
> container_e15_1542704944343_0001_01_000001]: Failed to launch container.
> java.io.IOException: Package doesn't exist as a resource: 
> /hdp/apps/3.0.0.0-1634/hbase/hbase.tar.gz
>       at 
> org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:41)
>       at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
>       at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> 2019-01-02 22:35:47,592 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT regionserver] Requesting for 1 container(s)
> 2019-01-02 22:35:47,592 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT regionserver] Submitting scheduling request: 
> SchedulingRequestPBImpl{priority=1, allocationReqId=1, 
> executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, 
> allocationTags=[regionserver], 
> resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:4096, 
> vCores:1>}, placementConstraint=notin,node,regionserver}
> 2019-01-02 22:35:47,593 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE regionserver-1 : 
> container_e15_1542704944343_0001_01_000001]: 
> container_e15_1542704944343_0001_01_000001 completed. Reinsert back to 
> pending list and requested a new container.
>  exitStatus=null, diagnostics=failed before launch
> 2019-01-02 22:35:47,593 [Component  dispatcher] INFO  
> instance.ComponentInstance - Publishing component instance status 
> container_e15_1542704944343_0001_01_000001 FAILED 
> 2019-01-02 22:35:47,593 [Component  dispatcher] ERROR 
> service.ServiceScheduler - [COMPINSTANCE regionserver-1 : 
> container_e15_1542704944343_0001_01_000001]: Error in handling event type STOP
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstance.handleComponentInstanceRelaunch(ComponentInstance.java:342)
>       at 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStoppedTransition.transition(ComponentInstance.java:482)
>       at 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStoppedTransition.transition(ComponentInstance.java:375)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>       at 
> org.apache.hadoop.yarn.service.component.instance.ComponentInstance.handle(ComponentInstance.java:679)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler.handle(ServiceScheduler.java:654)
>       at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentInstanceEventHandler.handle(ServiceScheduler.java:643)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>       at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to