[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for guaranteed containers which is not correct to fail an application

Bilwa S T (Jira) Mon, 26 Apr 2021 21:04:16 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Bilwa S T updated YARN-10670:
-----------------------------
    Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]yarn-site.xml ::: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Application Failure: desired = 20, completed = 20, allocated = 20, failed = 1, 
diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
Guaranateed container.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]yarn-site.xml ::: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -promote_opportunistic_after_start

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10670
>                 URL: https://issues.apache.org/jira/browse/YARN-10670
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: distributed-shell
>    Affects Versions: 3.1.1
>            Reporter: Sushanta Sen
>            Assignee: Bilwa S T
>            Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::<property>
>  <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
>  <value>true</value>
>  </property>
>  # Set this in NM[s]yarn-site.xml ::: <property>
>  <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
>  <value>30</value>
>  </property>
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC -promote_opportunistic_after_start
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Application Failure: desired = 20, completed = 20, allocated = 20, failed = 
> 1, diagnostics = [2021-02-09 22:11:48.440]Container killed to make room for 
> Guaranateed container.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for guaranteed containers which is not correct to fail an application

Reply via email to