[
https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335640#comment-16335640
]
lujie edited comment on YARN-7786 at 1/23/18 11:04 AM:
-------------------------------------------------------
I have restudy this bug, sorry about that previous analysis was wrong. I use
javassist to dynamic trace the bug, below is the reason:
# Before launch the AM, kill command arrives, then the
RMAppImpl.FinalTransition call the setAMContainerSpec :
{code:java}
// code placeholder
public void setAMContainerSpec(ContainerLaunchContext amContainer) {
maybeInitBuilder();
if (amContainer == null) {
builder.clearAmContainerSpec();
}
this.amContainer = amContainer;
}
{code}
parameter amContainer is null, so the function do two things:
clearAmContainerSpec and assigned the filed this.amContainer to null.
2. when Am launch begin to run, it will call getAMContainerSpec:
{code:java}
// code placeholder
ApplicationSubmissionContextProtoOrBuilder p = viaProto ? proto : builder;
if (this.amContainer != null) {
return amContainer;
} // Else via proto
if (!p.hasAmContainerSpec()) {
return null;
}
{code}
due to the filed this.amContainer is null, so the code will check
p.hasAmContainerSpec(), and due to clearAmContainerSpec in first step, the
!p.hasAmContainerSpec() is true, so getAMContainerSpec return null.
although I have understanded the reason, but I still do not know how to write
unit test. Or is it ok to return null in this situation ?
was (Author: xiaoheipangzi):
I have restudy this bug, sorry about that previous analysis was wrong. I use
javassist to trace the bug, below is the reason:
# Before launch the AM, kill command arrives, then the
RMAppImpl.FinalTransition call the setAMContainerSpec :
{code:java}
// code placeholder
public void setAMContainerSpec(ContainerLaunchContext amContainer) {
maybeInitBuilder();
if (amContainer == null) {
builder.clearAmContainerSpec();
}
this.amContainer = amContainer;
}
{code}
parameter amContainer is null, so the function do two things:
clearAmContainerSpec and assigned the filed this.amContainer to null.
2. when Am launch begin to run, it will call getAMContainerSpec:
{code:java}
// code placeholder
ApplicationSubmissionContextProtoOrBuilder p = viaProto ? proto : builder;
if (this.amContainer != null) {
return amContainer;
} // Else via proto
if (!p.hasAmContainerSpec()) {
return null;
}
{code}
due to the filed this.amContainer is null, so the code will check
p.hasAmContainerSpec(), and due to clearAmContainerSpec in first step, the
!p.hasAmContainerSpec() is true, so getAMContainerSpec return null.
although I have understanded the reason, but I still do not know how to write
unit test. Or is it ok to return null in this situation ?
> NullPointerException while launching ApplicationMaster
> ------------------------------------------------------
>
> Key: YARN-7786
> URL: https://issues.apache.org/jira/browse/YARN-7786
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.0.0-beta1
> Reporter: lujie
> Assignee: lujie
> Priority: Major
> Attachments: YARN-7786.patch, resourcemanager.log
>
>
> Before launching the ApplicationMaster, send kill command to the job, then
> some Null pointer appears:
> {code}
> 2017-11-25 21:27:25,333 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error
> launching appattempt_1511616410268_0001_000001. Got exception:
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205)
> at
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193)
> at
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112)
> at
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]