[ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10670:
-----------------------------
    Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]yarn-site.xml ::: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Application Failure: desired = 20, completed = 20, allocated = 20, failed = 1, 
diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
Guaranateed container.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]yarn-site.xml ::: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10670
>                 URL: https://issues.apache.org/jira/browse/YARN-10670
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: distributed-shell
>    Affects Versions: 3.1.1
>            Reporter: Sushanta Sen
>            Assignee: Bilwa S T
>            Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::<property>
>  <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
>  <value>true</value>
>  </property>
>  # Set this in NM[s]yarn-site.xml ::: <property>
>  <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
>  <value>30</value>
>  </property>
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC -promote_opportunistic_after_start
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Application Failure: desired = 20, completed = 20, allocated = 20, failed = 
> 1, diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
> Guaranateed container.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to