[jira] [Updated] (YARN-9681) AM resource limit is incorrect for queue

2019-07-24 Thread ANANDA G B (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ANANDA G B updated YARN-9681:
-
Fix Version/s: (was: 3.1.2)

> AM resource limit is incorrect for queue
> 
>
> Key: YARN-9681
> URL: https://issues.apache.org/jira/browse/YARN-9681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1, 3.1.2
>Reporter: ANANDA G B
>Priority: Major
>  Labels: patch
> Attachments: After running job on queue1.png, Before running job on 
> queue1.png, YARN-9681..patch
>
>
> After running the job on Queue1 of Partition1, then Queue1 of 
> DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. 
> Please find the attachement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892380#comment-16892380
 ] 

Hadoop QA commented on YARN-9596:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  8m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.8 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
57s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:b93746a |
| JIRA Issue | YARN-9596 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12975723/YARN-9596-branch-2.8.005.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7ee4468e751d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.8 / c07b626 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 

[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892265#comment-16892265
 ] 

Muhammad Samir Khan commented on YARN-9596:
---

Posted a patch for 2.8. It also includes a workaround in the unit test for race 
condition in AsyncDispatcher (see YARN-3878, YARN-5436, and YARN-5375).

For 2.8, we will also have to backport YARN-5788. Shall I post a patch here or 
should that be tracked separately?

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-2.8.005.patch, 
> YARN-9596-branch-3.0.004.patch, YARN-9596.001.patch, YARN-9596.002.patch, 
> YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Samir Khan updated YARN-9596:
--
Attachment: YARN-9596-branch-2.8.005.patch

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-2.8.005.patch, 
> YARN-9596-branch-3.0.004.patch, YARN-9596.001.patch, YARN-9596.002.patch, 
> YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892226#comment-16892226
 ] 

Eric Payne commented on YARN-9596:
--

bq. The unit test failures are also happening in branch-3.0.
Yes, I see that now. I will continue to review the 3.0 patch

Unfortunately, we will also need a branch-2.8 patch. It does not backport or 
apply cleanly to branch-2.8.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892191#comment-16892191
 ] 

Muhammad Samir Khan commented on YARN-9596:
---

The remaining two unit tests in TestNodeLabelContainerAllocation should have 
been fixed with YARN-7466 addendum patch but seems to be still broken in 
branch-3.0.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-07-24 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.ut.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892122#comment-16892122
 ] 

Muhammad Samir Khan commented on YARN-9596:
---

YARN-4901 fixes some of the unit test failures but it is not in branch-3.0.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-07-24 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9697:
---

 Summary: Efficient allocation of Opportunistic containers.
 Key: YARN-9697
 URL: https://issues.apache.org/jira/browse/YARN-9697
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


In the current implementation, opportunistic containers are allocated based on 
the number of queued opportunistic container information received in node 
heartbeat. This information becomes stale as soon as more opportunistic 
containers are allocated on that node.

Allocation of opportunistic containers happens on the same heartbeat in which 
AM asks for the containers. When multiple applications request for 
Opportunistic containers, containers might get allocated on the same set of 
nodes as already allocated containers on the node are not considered while 
serving requests from different applications. This can lead to uneven 
allocation of Opportunistic containers across the cluster leading to increased 
queuing time 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892081#comment-16892081
 ] 

Muhammad Samir Khan commented on YARN-9596:
---

The findbugs warnings are from branch-3.0 (pre-patch).

The unit test failures are also happening in branch-3.0. They just happen a 
little later since the assert statement is later in branch-3.0. Some of the 
tests fail if I run all tests in TestNodeLabelContainerAllocation but not if I 
run the specific tests by themselves.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9681) AM resource limit is incorrect for queue

2019-07-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892067#comment-16892067
 ] 

Hadoop QA commented on YARN-9681:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-9681 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9681 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12975676/YARN-9681..patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24421/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> AM resource limit is incorrect for queue
> 
>
> Key: YARN-9681
> URL: https://issues.apache.org/jira/browse/YARN-9681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1, 3.1.2
>Reporter: ANANDA G B
>Priority: Major
>  Labels: patch
> Fix For: 3.1.2
>
> Attachments: After running job on queue1.png, Before running job on 
> queue1.png, YARN-9681..patch
>
>
> After running the job on Queue1 of Partition1, then Queue1 of 
> DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. 
> Please find the attachement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892065#comment-16892065
 ] 

Eric Payne commented on YARN-9596:
--

Thanks, [~samkhan], for the 3.0 patch. The test failures for 
{{TestOpportunisticContainerAllocatorAMService}} seem to be happening in 3.0 
without this patch. However, the failures for 
{{TestNodeLabelContainerAllocation}} do seem to be caused by the 3.0 patch.

I'm concerned about the findbugs warnings, but I am not sure why this patch 
would have caused them.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9681) AM resource limit is incorrect for queue

2019-07-24 Thread ANANDA G B (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891998#comment-16891998
 ] 

ANANDA G B commented on YARN-9681:
--

Hi, [~sunilg], [~bibinchundatt], [~leftnoteasy] I have attached the patch can 
you please review it.

> AM resource limit is incorrect for queue
> 
>
> Key: YARN-9681
> URL: https://issues.apache.org/jira/browse/YARN-9681
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1, 3.1.2
>Reporter: ANANDA G B
>Priority: Major
>  Labels: patch
> Fix For: 3.1.2
>
> Attachments: After running job on queue1.png, Before running job on 
> queue1.png, YARN-9681..patch
>
>
> After running the job on Queue1 of Partition1, then Queue1 of 
> DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. 
> Please find the attachement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891986#comment-16891986
 ] 

Eric Payne commented on YARN-9596:
--

I'd like to document why a branch-3.0 patch was necessary.

In trunk and 3.2, {{CSQueueUtils.java#getMaxAvailableResourceToQueue}} 
calculated {{totalAvailableResource}} as follows:
{code:title=Trunk version of CSQueueUtils.java#getMaxAvailableResourceToQueue}
Resource totalAvailableResource = Resources.createResource(0, 0);
{code}
So, the new {{getMaxAvailableResourceToQueuePartition}} method calculated the 
same way.

However, when backporting to 3.0, {{totalAvailableResource}} should not be done 
the same way because it's different in 3.0:
{code:title=3.0 version of CSQueueUtils.java#getMaxAvailableResourceToQueue}
Resource queueGuranteedResource = Resources.multiply(nlm
.getResourceByLabel(partition, cluster), queue.getQueueCapacities()
.getAbsoluteCapacity(partition));
{code}

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-07-24 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891979#comment-16891979
 ] 

Jonathan Eagles commented on YARN-9563:
---

[~Jim_Brennan], thanks for pointing out the missing cherry-pick to branch-2. 
Cherry-picked this commit to branch-2 and updated fixed versions.

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: YARN-9563-branch-2.8.001.patch, 
> YARN-9563-branch-2.9.001.patch, YARN-9563-branch-3.0.001.patch, 
> YARN-9563.001.patch, YARN-9563.002.patch, YARN-9563.003.patch, 
> YARN-9563.004.patch, YARN-9563.005.patch, YARN-9563.006.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9563) Resource report REST API could return NaN or Inf

2019-07-24 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-9563:
--
Fix Version/s: 2.10.0

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: YARN-9563-branch-2.8.001.patch, 
> YARN-9563-branch-2.9.001.patch, YARN-9563-branch-3.0.001.patch, 
> YARN-9563.001.patch, YARN-9563.002.patch, YARN-9563.003.patch, 
> YARN-9563.004.patch, YARN-9563.005.patch, YARN-9563.006.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved

2019-07-24 Thread Muhammad Samir Khan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891957#comment-16891957
 ] 

Muhammad Samir Khan commented on YARN-9596:
---

Looking at the UT failures.

> QueueMetrics has incorrect metrics when labelled partitions are involved
> 
>
> Key: YARN-9596
> URL: https://issues.apache.org/jira/browse/YARN-9596
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.8.0, 3.3.0
>Reporter: Muhammad Samir Khan
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot 
> 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, 
> YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch
>
>
> After YARN-6467, QueueMetrics should only be tracking metrics for the default 
> partition. However, the metrics are incorrect when labelled partitions are 
> involved.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Add label "test" to cluster and replace label on node1 to be "test"
>  # Note down "totalMB" at 
> /ws/v1/cluster/metrics
>  # Start first job on test queue.
>  # Start second job on default queue (does not work if the order of two jobs 
> is swapped).
>  # While the two applications are running, the "totalMB" at 
> /ws/v1/cluster/metrics will go down by 
> the amount of MB used by the first job (screenshots attached).
> Alternately:
> In 
> TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(),
>  add the following line at the end of the test before rm1.close():
> CSQueue rootQueue = cs.getRootQueue();
> assertEquals(10*GB,
>  rootQueue.getMetrics().getAvailableMB() + 
> rootQueue.getMetrics().getAllocatedMB());
> There are two nodes of 10GB each and only one of them have a non-default 
> label. The test will also fail against 20*GB check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9696) one more import in org.apache.hadoop.conf.Configuration class

2019-07-24 Thread runzhou wu (JIRA)
runzhou wu created YARN-9696:


 Summary: one more import in org.apache.hadoop.conf.Configuration 
class
 Key: YARN-9696
 URL: https://issues.apache.org/jira/browse/YARN-9696
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: runzhou wu


LinkedList is not used .

it is in line 54. the content is "import java.util.LinkedList; " .i think it 
can be delete.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9691) canceling upgrade does not work if upgrade failed container is existing

2019-07-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891763#comment-16891763
 ] 

Hadoop QA commented on YARN-9691:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 3 new + 47 unchanged - 0 fixed = 50 total (was 47) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
37s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9691 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12975594/YARN-9691.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3d1b790bd58e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cf9ff08 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24420/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24420/testReport/ |
| Max. 

[jira] [Updated] (YARN-9691) canceling upgrade does not work if upgrade failed container is existing

2019-07-24 Thread kyungwan nam (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9691:
---
Attachment: YARN-9691.002.patch

> canceling upgrade does not work if upgrade failed container is existing
> ---
>
> Key: YARN-9691
> URL: https://issues.apache.org/jira/browse/YARN-9691
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9691.001.patch, YARN-9691.002.patch
>
>
> if a container is failed to upgrade during yarn service upgrade, it will be 
> released container and transition to FAILED_UPGRADE state.
> After then, I expected it is able to be back to the previous version using 
> cancel-upgrade. but, It didn’t work.
> At that time, AM log is as follows
> {code}
> # failed to upgrade container_e62_1563179597798_0006_01_08
> 2019-07-16 18:21:55,152 [IPC Server handler 0 on 39483] INFO  
> service.ClientAMService - Upgrade container 
> container_e62_1563179597798_0006_01_08
> 2019-07-16 18:21:55,153 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] spec state state changed from 
> NEEDS_UPGRADE -> UPGRADING
> 2019-07-16 18:21:55,154 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] Transitioned from READY to 
> UPGRADING on UPGRADE event
> 2019-07-16 18:21:55,154 [pool-5-thread-4] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08]: Deleting registry path 
> /users/test/services/yarn-service/sleeptest/components/ctr-e62-1563179597798-0006-01-08
> 2019-07-16 18:21:55,156 [pool-6-thread-6] INFO  provider.ProviderUtils - 
> [COMPINSTANCE sleep-0 : container_e62_1563179597798_0006_01_08] version 
> 1.0.1 : Creating dir on hdfs: 
> hdfs://test1.com:8020/user/test/.yarn/services/sleeptest/components/1.0.1/sleep/sleep-0
> 2019-07-16 18:21:55,157 [pool-6-thread-6] INFO  
> containerlaunch.ContainerLaunchService - reInitializing container 
> container_e62_1563179597798_0006_01_08 with version 1.0.1
> 2019-07-16 18:21:55,157 [pool-6-thread-6] INFO  
> containerlaunch.AbstractLauncher - yarn docker env var has been set 
> {LANGUAGE=en_US.UTF-8, HADOOP_USER_NAME=test, 
> YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME=sleep-0.sleeptest.test.EXAMPLE.COM,
>  WORK_DIR=$PWD, LC_ALL=en_US.UTF-8, YARN_CONTAINER_RUNTIME_TYPE=docker, 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=registry.test.com/test/sleep1:latest, 
> LANG=en_US.UTF-8, YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=bridge, 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true, LOG_DIR=}
> 2019-07-16 18:21:55,158 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #7] INFO  
> impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER 
> for Container container_e62_1563179597798_0006_01_08
> 2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] spec state state changed from 
> UPGRADING -> RUNNING_BUT_UNREADY
> 2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] retrieve status after 30
> 2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] Transitioned from UPGRADING to 
> REINITIALIZED on START event
> 2019-07-16 18:22:07,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:07 
> KST 2019", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: sleep-0: IP is not available yet"
> 2019-07-16 18:22:37,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:37 
> KST 2019", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: sleep-0: IP is not available yet"
> 2019-07-16 18:23:07,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:23:07 
> KST 2019", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: sleep-0: IP is not available yet"
> 2019-07-16 18:23:08,225 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
> container_e62_1563179597798_0006_01_08] spec state state changed from 
> RUNNING_BUT_UNREADY -> FAILED_UPGRADE
> #