[
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668678#comment-15668678
]
Wilfred Spiegelenburg commented on YARN-5136:
---------------------------------------------
I was thrown of track a bit with all the changes that were made to the locking
in the scheduler in YARN-3139.
After analysis it shows that the issue is not resolved yet and we have two
situations that can cause a the above mentioned problem:
# if a call for a {{removeApplicationAttempt}} and a {{moveApplication}} for
the same attempt are processed in that order in short succession the
application attempt will still contain a queue reference but is already removed
from the list of applications for the queue
# if two calls to {{removeApplicationAttempt}} come in in short succession the
application will still contain a queue reference but is already removed from
the list of applications for the queue
In both cases the 2nd call must come in before the {{removeApplication}} call
is made.
> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -----------------------------------------------------------------
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.1
> Reporter: tangshangwen
> Assignee: Wilfred Spiegelenburg
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
> does not exist in queue [root.bdp_xx.bdp_mart_xx_formal,
> demand=<memory:28672000, vCores:14000>, running=<memory:28647424,
> vCores:13422>, share=<memory:28672000, vCores:0>, w=<memory weight=1.0, cpu
> weight=1.0>]
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e04_1464073905025_15410_01_001759 Container Transitioned from
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]