[
https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-6403:
-----------------------------
Attachment: YARN-6403.branch-2.8.004.patch
Thanks for updating the patch! Looks good to me. Posting the same branch-2.8
patch again so Jenkins can comment on it, as it will only comment on one patch
at a time if many are posted at once.
> Invalid local resource request can raise NPE and make NM exit
> -------------------------------------------------------------
>
> Key: YARN-6403
> URL: https://issues.apache.org/jira/browse/YARN-6403
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.8.0
> Reporter: Tao Yang
> Assignee: Tao Yang
> Attachments: YARN-6403.001.patch, YARN-6403.002.patch,
> YARN-6403.004.patch, YARN-6403.branch-2.8.003.patch,
> YARN-6403.branch-2.8.004.patch, YARN-6403.branch-2.8.004.patch
>
>
> Recently we found this problem on our testing environment. The app that
> caused this problem added a invalid local resource request(have no location)
> into ContainerLaunchContext like this:
> {code}
> localResources.put("test", LocalResource.newInstance(location,
> LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
> System.currentTimeMillis()));
> ContainerLaunchContext amContainer =
> ContainerLaunchContext.newInstance(localResources, environment,
> vargsFinal, null, securityTokens, acls);
> {code}
> The actual value of location was null although app doesn't expect that. This
> mistake cause several NMs exited with the NPE below and can't restart until
> the nm recovery dirs were deleted.
> {code}
> FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.<init>(LocalResourceRequest.java:46)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE occured when created LocalResourceRequest instance for invalid resource
> request.
> {code}
> public LocalResourceRequest(LocalResource resource)
> throws URISyntaxException {
> this(resource.getResource().toPath(), //NPE occurred here
> resource.getTimestamp(),
> resource.getType(),
> resource.getVisibility(),
> resource.getPattern());
> }
> {code}
> We can't guarantee the validity of local resource request now, but we could
> avoid damaging the cluster. Perhaps we can verify the resource both in
> ContainerLaunchContext and LocalResourceRequest? Please feel free to give
> your suggestions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]