[jira] [Updated] (YARN-9681) AM resource limit is incorrect for queue
[ https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ANANDA G B updated YARN-9681: - Fix Version/s: (was: 3.1.2) > AM resource limit is incorrect for queue > > > Key: YARN-9681 > URL: https://issues.apache.org/jira/browse/YARN-9681 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1, 3.1.2 >Reporter: ANANDA G B >Priority: Major > Labels: patch > Attachments: After running job on queue1.png, Before running job on > queue1.png, YARN-9681..patch > > > After running the job on Queue1 of Partition1, then Queue1 of > DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. > Please find the attachement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892380#comment-16892380 ] Hadoop QA commented on YARN-9596: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.8 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 57s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 48s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}104m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:b93746a | | JIRA Issue | YARN-9596 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975723/YARN-9596-branch-2.8.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7ee4468e751d 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.8 / c07b626 | | maven | version: Apache Maven 3.3.9 | | Default Java |
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892265#comment-16892265 ] Muhammad Samir Khan commented on YARN-9596: --- Posted a patch for 2.8. It also includes a workaround in the unit test for race condition in AsyncDispatcher (see YARN-3878, YARN-5436, and YARN-5375). For 2.8, we will also have to backport YARN-5788. Shall I post a patch here or should that be tracked separately? > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-2.8.005.patch, > YARN-9596-branch-3.0.004.patch, YARN-9596.001.patch, YARN-9596.002.patch, > YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muhammad Samir Khan updated YARN-9596: -- Attachment: YARN-9596-branch-2.8.005.patch > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-2.8.005.patch, > YARN-9596-branch-3.0.004.patch, YARN-9596.001.patch, YARN-9596.002.patch, > YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892226#comment-16892226 ] Eric Payne commented on YARN-9596: -- bq. The unit test failures are also happening in branch-3.0. Yes, I see that now. I will continue to review the 3.0 patch Unfortunately, we will also need a branch-2.8 patch. It does not backport or apply cleanly to branch-2.8. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892191#comment-16892191 ] Muhammad Samir Khan commented on YARN-9596: --- The remaining two unit tests in TestNodeLabelContainerAllocation should have been fixed with YARN-7466 addendum patch but seems to be still broken in branch-3.0. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.ut.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892122#comment-16892122 ] Muhammad Samir Khan commented on YARN-9596: --- YARN-4901 fixes some of the unit test failures but it is not in branch-3.0. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9697) Efficient allocation of Opportunistic containers.
Abhishek Modi created YARN-9697: --- Summary: Efficient allocation of Opportunistic containers. Key: YARN-9697 URL: https://issues.apache.org/jira/browse/YARN-9697 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi In the current implementation, opportunistic containers are allocated based on the number of queued opportunistic container information received in node heartbeat. This information becomes stale as soon as more opportunistic containers are allocated on that node. Allocation of opportunistic containers happens on the same heartbeat in which AM asks for the containers. When multiple applications request for Opportunistic containers, containers might get allocated on the same set of nodes as already allocated containers on the node are not considered while serving requests from different applications. This can lead to uneven allocation of Opportunistic containers across the cluster leading to increased queuing time -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892081#comment-16892081 ] Muhammad Samir Khan commented on YARN-9596: --- The findbugs warnings are from branch-3.0 (pre-patch). The unit test failures are also happening in branch-3.0. They just happen a little later since the assert statement is later in branch-3.0. Some of the tests fail if I run all tests in TestNodeLabelContainerAllocation but not if I run the specific tests by themselves. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9681) AM resource limit is incorrect for queue
[ https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892067#comment-16892067 ] Hadoop QA commented on YARN-9681: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-9681 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9681 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975676/YARN-9681..patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24421/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > AM resource limit is incorrect for queue > > > Key: YARN-9681 > URL: https://issues.apache.org/jira/browse/YARN-9681 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1, 3.1.2 >Reporter: ANANDA G B >Priority: Major > Labels: patch > Fix For: 3.1.2 > > Attachments: After running job on queue1.png, Before running job on > queue1.png, YARN-9681..patch > > > After running the job on Queue1 of Partition1, then Queue1 of > DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. > Please find the attachement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892065#comment-16892065 ] Eric Payne commented on YARN-9596: -- Thanks, [~samkhan], for the 3.0 patch. The test failures for {{TestOpportunisticContainerAllocatorAMService}} seem to be happening in 3.0 without this patch. However, the failures for {{TestNodeLabelContainerAllocation}} do seem to be caused by the 3.0 patch. I'm concerned about the findbugs warnings, but I am not sure why this patch would have caused them. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9681) AM resource limit is incorrect for queue
[ https://issues.apache.org/jira/browse/YARN-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891998#comment-16891998 ] ANANDA G B commented on YARN-9681: -- Hi, [~sunilg], [~bibinchundatt], [~leftnoteasy] I have attached the patch can you please review it. > AM resource limit is incorrect for queue > > > Key: YARN-9681 > URL: https://issues.apache.org/jira/browse/YARN-9681 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.1, 3.1.2 >Reporter: ANANDA G B >Priority: Major > Labels: patch > Fix For: 3.1.2 > > Attachments: After running job on queue1.png, Before running job on > queue1.png, YARN-9681..patch > > > After running the job on Queue1 of Partition1, then Queue1 of > DEFAULT_PARTITION's 'Max Application Master Resources' is calculated wrongly. > Please find the attachement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891986#comment-16891986 ] Eric Payne commented on YARN-9596: -- I'd like to document why a branch-3.0 patch was necessary. In trunk and 3.2, {{CSQueueUtils.java#getMaxAvailableResourceToQueue}} calculated {{totalAvailableResource}} as follows: {code:title=Trunk version of CSQueueUtils.java#getMaxAvailableResourceToQueue} Resource totalAvailableResource = Resources.createResource(0, 0); {code} So, the new {{getMaxAvailableResourceToQueuePartition}} method calculated the same way. However, when backporting to 3.0, {{totalAvailableResource}} should not be done the same way because it's different in 3.0: {code:title=3.0 version of CSQueueUtils.java#getMaxAvailableResourceToQueue} Resource queueGuranteedResource = Resources.multiply(nlm .getResourceByLabel(partition, cluster), queue.getQueueCapacities() .getAbsoluteCapacity(partition)); {code} > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf
[ https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891979#comment-16891979 ] Jonathan Eagles commented on YARN-9563: --- [~Jim_Brennan], thanks for pointing out the missing cherry-pick to branch-2. Cherry-picked this commit to branch-2 and updated fixed versions. > Resource report REST API could return NaN or Inf > > > Key: YARN-9563 > URL: https://issues.apache.org/jira/browse/YARN-9563 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: YARN-9563-branch-2.8.001.patch, > YARN-9563-branch-2.9.001.patch, YARN-9563-branch-3.0.001.patch, > YARN-9563.001.patch, YARN-9563.002.patch, YARN-9563.003.patch, > YARN-9563.004.patch, YARN-9563.005.patch, YARN-9563.006.patch > > > The Resource Manager's Cluster Applications and Cluster Application REST APIs > are sometimes returning invalid JSON. This was addressed in YARN-6082. > However, the fix only fixes the calculation in one site and does not > guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web > GUI if the columns cannot render non-numeric values. > The suggested fix is to check for NaN/Inf in the protob. The protob replaces > NaN/Inf by 0.0f. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9563) Resource report REST API could return NaN or Inf
[ https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-9563: -- Fix Version/s: 2.10.0 > Resource report REST API could return NaN or Inf > > > Key: YARN-9563 > URL: https://issues.apache.org/jira/browse/YARN-9563 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: YARN-9563-branch-2.8.001.patch, > YARN-9563-branch-2.9.001.patch, YARN-9563-branch-3.0.001.patch, > YARN-9563.001.patch, YARN-9563.002.patch, YARN-9563.003.patch, > YARN-9563.004.patch, YARN-9563.005.patch, YARN-9563.006.patch > > > The Resource Manager's Cluster Applications and Cluster Application REST APIs > are sometimes returning invalid JSON. This was addressed in YARN-6082. > However, the fix only fixes the calculation in one site and does not > guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web > GUI if the columns cannot render non-numeric values. > The suggested fix is to check for NaN/Inf in the protob. The protob replaces > NaN/Inf by 0.0f. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891957#comment-16891957 ] Muhammad Samir Khan commented on YARN-9596: --- Looking at the UT failures. > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596-branch-3.0.004.patch, > YARN-9596.001.patch, YARN-9596.002.patch, YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9696) one more import in org.apache.hadoop.conf.Configuration class
runzhou wu created YARN-9696: Summary: one more import in org.apache.hadoop.conf.Configuration class Key: YARN-9696 URL: https://issues.apache.org/jira/browse/YARN-9696 Project: Hadoop YARN Issue Type: Bug Reporter: runzhou wu LinkedList is not used . it is in line 54. the content is "import java.util.LinkedList; " .i think it can be delete. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9691) canceling upgrade does not work if upgrade failed container is existing
[ https://issues.apache.org/jira/browse/YARN-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891763#comment-16891763 ] Hadoop QA commented on YARN-9691: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core: The patch generated 3 new + 47 unchanged - 0 fixed = 50 total (was 47) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 37s{color} | {color:green} hadoop-yarn-services-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9691 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975594/YARN-9691.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3d1b790bd58e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cf9ff08 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24420/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24420/testReport/ | | Max.
[jira] [Updated] (YARN-9691) canceling upgrade does not work if upgrade failed container is existing
[ https://issues.apache.org/jira/browse/YARN-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-9691: --- Attachment: YARN-9691.002.patch > canceling upgrade does not work if upgrade failed container is existing > --- > > Key: YARN-9691 > URL: https://issues.apache.org/jira/browse/YARN-9691 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-9691.001.patch, YARN-9691.002.patch > > > if a container is failed to upgrade during yarn service upgrade, it will be > released container and transition to FAILED_UPGRADE state. > After then, I expected it is able to be back to the previous version using > cancel-upgrade. but, It didn’t work. > At that time, AM log is as follows > {code} > # failed to upgrade container_e62_1563179597798_0006_01_08 > 2019-07-16 18:21:55,152 [IPC Server handler 0 on 39483] INFO > service.ClientAMService - Upgrade container > container_e62_1563179597798_0006_01_08 > 2019-07-16 18:21:55,153 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] spec state state changed from > NEEDS_UPGRADE -> UPGRADING > 2019-07-16 18:21:55,154 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] Transitioned from READY to > UPGRADING on UPGRADE event > 2019-07-16 18:21:55,154 [pool-5-thread-4] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08]: Deleting registry path > /users/test/services/yarn-service/sleeptest/components/ctr-e62-1563179597798-0006-01-08 > 2019-07-16 18:21:55,156 [pool-6-thread-6] INFO provider.ProviderUtils - > [COMPINSTANCE sleep-0 : container_e62_1563179597798_0006_01_08] version > 1.0.1 : Creating dir on hdfs: > hdfs://test1.com:8020/user/test/.yarn/services/sleeptest/components/1.0.1/sleep/sleep-0 > 2019-07-16 18:21:55,157 [pool-6-thread-6] INFO > containerlaunch.ContainerLaunchService - reInitializing container > container_e62_1563179597798_0006_01_08 with version 1.0.1 > 2019-07-16 18:21:55,157 [pool-6-thread-6] INFO > containerlaunch.AbstractLauncher - yarn docker env var has been set > {LANGUAGE=en_US.UTF-8, HADOOP_USER_NAME=test, > YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME=sleep-0.sleeptest.test.EXAMPLE.COM, > WORK_DIR=$PWD, LC_ALL=en_US.UTF-8, YARN_CONTAINER_RUNTIME_TYPE=docker, > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=registry.test.com/test/sleep1:latest, > LANG=en_US.UTF-8, YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=bridge, > YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true, LOG_DIR=} > 2019-07-16 18:21:55,158 > [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #7] INFO > impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER > for Container container_e62_1563179597798_0006_01_08 > 2019-07-16 18:21:55,167 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] spec state state changed from > UPGRADING -> RUNNING_BUT_UNREADY > 2019-07-16 18:21:55,167 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] retrieve status after 30 > 2019-07-16 18:21:55,167 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] Transitioned from UPGRADING to > REINITIALIZED on START event > 2019-07-16 18:22:07,797 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:07 > KST 2019", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: sleep-0: IP is not available yet" > 2019-07-16 18:22:37,797 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:37 > KST 2019", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: sleep-0: IP is not available yet" > 2019-07-16 18:23:07,797 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:23:07 > KST 2019", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: sleep-0: IP is not available yet" > 2019-07-16 18:23:08,225 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE sleep-0 : > container_e62_1563179597798_0006_01_08] spec state state changed from > RUNNING_BUT_UNREADY -> FAILED_UPGRADE > #