Sushanta Sen created YARN-10670:
-----------------------------------
Summary: YARN: Opportunistic Container : : In distributed shell
job if containers are killed then application is failed. But in this case as
containers are killed to make room for guaranteed containers which is not
correct to fail an application
Key: YARN-10670
URL: https://issues.apache.org/jira/browse/YARN-10670
Project: Hadoop YARN
Issue Type: Bug
Components: distributed-shell
Affects Versions: 3.1.1
Reporter: Sushanta Sen
Preconditions:
# Secure Hadoop 3.1.1 c3 Nodes cluster is installed
# Set the below parameters in RM::<property>
<name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
<value>true</value>
</property>
# Set this in NM[s]: <property>
<name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
<value>30</value>
</property>
Test Steps:
Job Command : : yarn
org.apache.hadoop.yarn.applications.distributedshell.Client -jar
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
-shell_command sleep -shell_args 20 -num_containers 20 -container_type
OPPORTUNISTIC
Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]