[
https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487557#comment-16487557
]
Jason Lowe commented on YARN-8346:
----------------------------------
IIUC the issue isn't the queue length setting but rather that the containers
are recovered with the wrong execution type (opportunistic instead of
guaranteed).
I believe the bug is here in ContainerTokenIdentifier#getExecutionType:
{code}
public ExecutionType getExecutionType(){
if (!proto.hasExecutionType()) {
return null;
}
return convertFromProtoFormat(proto.getExecutionType());
}
{code}
Instead of returning NULL for the execution type it should return GUARANTEED.
All containers before an execution type was added were effectively guaranteed
since that was the only execution type supported.
> Upgrading to 3.1 kills running containers with error "Opportunistic container
> queue is full"
> --------------------------------------------------------------------------------------------
>
> Key: YARN-8346
> URL: https://issues.apache.org/jira/browse/YARN-8346
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 3.1.0, 3.0.2
> Reporter: Rohith Sharma K S
> Priority: Blocker
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the
> running containers are killed and second attempt is launched for that
> application. The diagnostics message is "Opportunistic container queue is
> full" which is the reason for container killed.
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
> Opportunistic container [container_e06_1527075664705_0001_01_000001] will
> not be queued at the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]