[jira] [Commented] (YARN-8380) Support shared mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548795#comment-16548795 ] genericqa commented on YARN-8380: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.TestNvidiaDockerV1CommandPlugin | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8380 | | JIRA Patch URL |
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548789#comment-16548789 ] Rohith Sharma K S commented on YARN-8501: - Thanks [~snemeth] for working on this patch. This is nice refactoring. Could you also make a change in AHSWebServices#getApps? > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548779#comment-16548779 ] genericqa commented on YARN-8548: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932179/YARN-8548-002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 266cf4d5a758 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ba1ab08 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21292/testReport/ | | Max. process+thread count | 397 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21292/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. >
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548751#comment-16548751 ] genericqa commented on YARN-8501: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 17s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}140m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8501 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931991/YARN-8501.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ee81fb21d519 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ba1ab08 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21291/testReport/ | | Max. process+thread count | 933 (vs. ulimit of 1) | |
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548703#comment-16548703 ] genericqa commented on YARN-8330: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}127m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8330 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932168/YARN-8330.2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a884f3f35524 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ba1ab08 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21290/testReport/ | | Max. process+thread count | 830 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21290/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > An extra container got launched by RM for
[jira] [Commented] (YARN-8550) YARN root queue exceeds 100%
[ https://issues.apache.org/jira/browse/YARN-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548685#comment-16548685 ] Weiwei Yang commented on YARN-8550: --- I have observed a similar issue in YARN-8546. That was caused by two allocations both take room from a reserved container causing queue usage exceeds 100%. But that was when async scheduling enabled ("yarn.scheduler.capacity.schedule-asynchronously.enabled=true"), not in v2.7.3. Not sure if they are the same issue. > YARN root queue exceeds 100% > > > Key: YARN-8550 > URL: https://issues.apache.org/jira/browse/YARN-8550 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Priority: Major > Attachments: Screen Shot 2018-07-13 at 1.42.41 PM.png > > > YARN root queue usage is more than 100% which is misleading. (attached > screenshot) This happens when there is a container reserved and so used + > reserved exceeds Total. Cluster is configured with CPU Scheduling. > {code} > 2018-07-17 13:27:59,569 INFO capacity.ParentQueue > (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root > usedCapacity=0.9713542 absoluteUsedCapacity=0.9713542 used= vCores:83> cluster= > 2018-07-17 13:27:59,627 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(422)) - > container_e56_1531419441577_2045_01_03 Container Transitioned from NEW to > RESERVED > 2018-07-17 13:27:59,627 INFO allocator.AbstractContainerAllocator > (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(77)) - > Reserved container application=application_1531419441577_2045 > resource= > > queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@2a1563f4 > cluster= > 2018-07-17 13:27:59,627 INFO capacity.ParentQueue > (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root > usedCapacity=1.0390625 absoluteUsedCapacity=1.0390625 used= vCores:85> cluster= > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548673#comment-16548673 ] Bilwa S T commented on YARN-8548: - Hi [~bibinchundatt]. You are correct. It should be invoked at the starting of method. I have updated the patch > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch, YARN-8548-002.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8548: Attachment: YARN-8548-002.patch > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch, YARN-8548-002.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8380) Support shared mounts in docker runtime
[ https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billie Rinaldi updated YARN-8380: - Attachment: YARN-8380.1.patch > Support shared mounts in docker runtime > --- > > Key: YARN-8380 > URL: https://issues.apache.org/jira/browse/YARN-8380 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8380.1.patch > > > The docker run command supports the mount type shared, but currently we are > only supporting ro and rw mount types in the docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548603#comment-16548603 ] Suma Shivaprasad commented on YARN-8330: Attached patch which publishes container creation events in case of allocated/acquired state transitions instead of in the RMContainerImpl constructor/setContainerId calls > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8330.2.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch, YARN-8330.2.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8551) Build Common module for MaWo application
[ https://issues.apache.org/jira/browse/YARN-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora reassigned YARN-8551: Assignee: Yesha Vora > Build Common module for MaWo application > > > Key: YARN-8551 > URL: https://issues.apache.org/jira/browse/YARN-8551 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yesha Vora >Assignee: Yesha Vora >Priority: Major > > Build Common module for MaWo application. > This module should include defination of Task. A Task should contain > * TaskID > * Task Command > * Task Environment > * Task Timeout > * Task Type > ** Simple Task > *** Its a single Task > ** Composite Task > *** Its a composition of multiple simple tasks > ** Teardown Task > *** Its a last task to be executed after a job is finished > ** Null Task > *** Its a null task -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548546#comment-16548546 ] Robert Kanter edited comment on YARN-6966 at 7/18/18 11:01 PM: --- It looks like the job is broken even though it's up now. All of the tests from today ran for < 1 min before failing :( I think it's fine if making a testcase for that will be too tricky. +1 LGM pending Jenkins [~haibochen] any other comments? was (Author: rkanter): It looks like the job is broken even though it's up now. All of the tests from today ran for < 1 min before failing :( +1 LGM pending Jenkins [~haibochen] any other comments? > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548546#comment-16548546 ] Robert Kanter commented on YARN-6966: - It looks like the job is broken even though it's up now. All of the tests from today ran for < 1 min before failing :( +1 LGM pending Jenkins [~haibochen] any other comments? > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548486#comment-16548486 ] genericqa commented on YARN-8330: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8330 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932145/YARN-8330.1.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21285/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548484#comment-16548484 ] genericqa commented on YARN-8548: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932079/YARN-8548-001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21286/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548490#comment-16548490 ] genericqa commented on YARN-8529: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 9s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8529 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932013/YARN-8529.v2.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21288/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548487#comment-16548487 ] genericqa commented on YARN-8301: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8301 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932153/YARN-8301.004.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21287/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch, YARN-8301.004.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-8429: --- Assignee: Gour Saha > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8551) Build Common module for MaWo application
[ https://issues.apache.org/jira/browse/YARN-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yesha Vora updated YARN-8551: - Description: Build Common module for MaWo application. This module should include defination of Task. A Task should contain * TaskID * Task Command * Task Environment * Task Timeout * Task Type ** Simple Task *** Its a single Task ** Composite Task *** Its a composition of multiple simple tasks ** Teardown Task *** Its a last task to be executed after a job is finished ** Null Task *** Its a null task was: Build Common module for MaWo application. This module should include defination of Task. A Task should contain * TaskID * Task Command * Task Environment * Task Timeout * Task Type ** Simple Task *** Its a single Task ** Composite Task *** Its a composition of multiple simple tasks ** Die Task *** Its a last task to be executed after a job is finished ** Null Task *** Its a null task > Build Common module for MaWo application > > > Key: YARN-8551 > URL: https://issues.apache.org/jira/browse/YARN-8551 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Yesha Vora >Priority: Major > > Build Common module for MaWo application. > This module should include defination of Task. A Task should contain > * TaskID > * Task Command > * Task Environment > * Task Timeout > * Task Type > ** Simple Task > *** Its a single Task > ** Composite Task > *** Its a composition of multiple simple tasks > ** Teardown Task > *** Its a last task to be executed after a job is finished > ** Null Task > *** Its a null task -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8551) Build Common module for MaWo application
Yesha Vora created YARN-8551: Summary: Build Common module for MaWo application Key: YARN-8551 URL: https://issues.apache.org/jira/browse/YARN-8551 Project: Hadoop YARN Issue Type: Sub-task Reporter: Yesha Vora Build Common module for MaWo application. This module should include defination of Task. A Task should contain * TaskID * Task Command * Task Environment * Task Timeout * Task Type ** Simple Task *** Its a single Task ** Composite Task *** Its a composition of multiple simple tasks ** Die Task *** Its a last task to be executed after a job is finished ** Null Task *** Its a null task -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8542: Description: GET app/v1/services/{{service-name}}/component-instances returns a list of containers with YARN-8299. {code:java} [ { "id": "container_1531508836237_0001_01_03", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509014497, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-1" }, { "id": "container_1531508836237_0001_01_02", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509013492, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-0" } ]{code} {{component_name}} is not part of container json, so it is hard to tell which component an instance belongs to. To fix this, will change the format of returned containers to: {code:java} [ { "name": "ping", "containers": [ { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-0", "hostname": "ping-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_02", "ip": "172.26.111.21", "launch_time": 1531767377301, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-1", "hostname": "ping-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_07", "ip": "172.26.111.21", "launch_time": 1531767410395, "state": "RUNNING_BUT_UNREADY" } ] }, { "name": "sleep", "containers": [ { "bare_host": "eyang-5.openstacklocal", "component_instance_name": "sleep-0", "hostname": "sleep-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_04", "ip": "172.26.111.20", "launch_time": 1531767377710, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "sleep-1", "hostname": "sleep-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_05", "ip": "172.26.111.21", "launch_time": 1531767378303, "state": "READY" } ] } ]{code} was: GET app/v1/services/{\{service-name}}/component-instances returns a list of containers with YARN-8299. {code:java} [ { "id": "container_1531508836237_0001_01_03", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509014497, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-1" }, { "id": "container_1531508836237_0001_01_02", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509013492, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-0" } ]{code} {{component_name}} is not part of container json, so it is hard to tell which component an instance belongs to. Change the list of containers return > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. > To fix this, will change the format of returned containers to: > {code:java} > [ > { > "name": "ping", > "containers": [ > { > "bare_host": "eyang-4.openstacklocal", > "component_instance_name": "ping-0", > "hostname": "ping-0.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_02", > "ip": "172.26.111.21", > "launch_time": 1531767377301, > "state": "READY" > }, > { > "bare_host": "eyang-4.openstacklocal", > "component_instance_name": "ping-1", > "hostname": "ping-1.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_07", > "ip": "172.26.111.21", > "launch_time": 1531767410395, > "state": "RUNNING_BUT_UNREADY" > } > ] > }, > { >
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548426#comment-16548426 ] Bibin A Chundatt commented on YARN-8548: [~BilwaST] As per current patch {{maybeInitBuilder()}} will be invoked only if the nmTokens are empty of null. Please make sure is its called for all cases. Moved to start of method call. > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548417#comment-16548417 ] Gour Saha commented on YARN-8301: - Great. Patch 4 looks good. Not sure why I see the trailing whitespaces when I apply the patch. The jenkins build should tell us. +1 for 004 pending jenkins. > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch, YARN-8301.004.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548405#comment-16548405 ] Chandni Singh commented on YARN-8301: - Addressed [~gsaha] comments in patch 4. I didn't find many trailing whitespaces. Let me know if you still see them. > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch, YARN-8301.004.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8301: Attachment: YARN-8301.004.patch > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch, YARN-8301.004.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Comment: was deleted (was: Attached patch which calls SystemMetricsPblisher.containerCreated in ContainerStartedTransition instead of the constructor.) > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8330.1.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: (was: YARN-8330.1.patch) > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: (was: YARN-8330.1.patch) > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8330.1.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service
[ https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated YARN-8330: --- Attachment: YARN-8330.1.patch > An extra container got launched by RM for yarn-service > -- > > Key: YARN-8330 > URL: https://issues.apache.org/jira/browse/YARN-8330 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Yesha Vora >Assignee: Suma Shivaprasad >Priority: Critical > Attachments: YARN-8330.1.patch > > > Steps: > launch Hbase tarball app > list containers for hbase tarball app > {code} > /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list > appattempt_1525463491331_0006_01 > WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of > YARN_LOG_DIR. > WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of > YARN_LOGFILE. > WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of > YARN_PID_DIR. > WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS. > 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > Total number of containers :5 > Container-IdStart Time Finish Time > StateHost Node Http Address >LOG-URL > container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018 > N/A RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa > 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03 > Fri May 04 22:34:26 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01 > Fri May 04 22:34:15 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa > 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05 > Fri May 04 22:34:56 + 2018 N/A > RUNNINGxxx:25454 http://xxx:8042 > http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa > 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - > run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04 > Fri May 04 22:34:56 + 2018 N/A > nullxxx:25454 http://xxx:8042 > http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code} > Total expected containers = 4 ( 3 components container + 1 am). Instead, RM > is listing 5 containers. > container_e06_1525463491331_0006_01_04 is in null state. > Yarn service utilized container 02, 03, 05 for component. There is no log > available in NM & AM related to container 04. Only one line in RM log is > printed > {code} > 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl > (RMContainerImpl.java:handle(489)) - > container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to > RESERVED{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548384#comment-16548384 ] Chandni Singh commented on YARN-8301: - {quote} In line 148 do we need the line "name": "sleeper-service" in the JSON spec for version 1.0.1 of the service. {quote} No, will remove it. > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8542: Description: GET app/v1/services/{\{service-name}}/component-instances returns a list of containers with YARN-8299. {code:java} [ { "id": "container_1531508836237_0001_01_03", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509014497, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-1" }, { "id": "container_1531508836237_0001_01_02", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509013492, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-0" } ]{code} {{component_name}} is not part of container json, so it is hard to tell which component an instance belongs to. Change the list of containers return was: GET app/v1/services/{\{service-name}}/component-instances returns a list of containers with YARN-8299. {code:java} [ { "id": "container_1531508836237_0001_01_03", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509014497, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-1" }, { "id": "container_1531508836237_0001_01_02", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509013492, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-0" } ]{code} {{component_name}} is not part of container json, so it is hard to tell which component an instance belongs to. > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{\{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. > Change the list of containers return -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548379#comment-16548379 ] Chandni Singh commented on YARN-8542: - [~gsaha] Ok. That sounds reasonable. Will change it to the format you have proposed. > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{\{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548375#comment-16548375 ] Gour Saha commented on YARN-8542: - [~csingh] agreed that the API is to request for containers. However, the structure I proposed adheres to the current status API structure and the swagger definition. Note, service owners are already parsing through the component instances across multiple components in the status response payload if they need a single collection of all component instances. If you add a new attribute "component_name" now, you would need to modify the swagger definition and it would actually mean a change for the end-users since they would have to handle the containers API output differently from the status API output. Let me know what you think. > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{\{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548360#comment-16548360 ] Gour Saha commented on YARN-8301: - [~csingh], patch 2 looks good. Let's add to the top of this doc - "Experimental Feature - Tech Preview" and create a reference to it from Overview.md (and also mention it there that it is an Experimental Feature - Tech Preview). Thanks [~eyang] for pointing this out. Few minor comments - 1. In line 148 do we need the line "name": "sleeper-service" in the JSON spec for version 1.0.1 of the service. 2. Remove the trailing whitespaces from all the lines > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8301: Attachment: YARN-8301.003.patch > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548358#comment-16548358 ] Chandni Singh commented on YARN-8301: - Addressed offline comments in patch 3 > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8550) YARN root queue exceeds 100%
Prabhu Joseph created YARN-8550: --- Summary: YARN root queue exceeds 100% Key: YARN-8550 URL: https://issues.apache.org/jira/browse/YARN-8550 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.3 Reporter: Prabhu Joseph Attachments: Screen Shot 2018-07-13 at 1.42.41 PM.png YARN root queue usage is more than 100% which is misleading. (attached screenshot) This happens when there is a container reserved and so used + reserved exceeds Total. Cluster is configured with CPU Scheduling. {code} 2018-07-17 13:27:59,569 INFO capacity.ParentQueue (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root usedCapacity=0.9713542 absoluteUsedCapacity=0.9713542 used= cluster= 2018-07-17 13:27:59,627 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e56_1531419441577_2045_01_03 Container Transitioned from NEW to RESERVED 2018-07-17 13:27:59,627 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(77)) - Reserved container application=application_1531419441577_2045 resource= queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@2a1563f4 cluster= 2018-07-17 13:27:59,627 INFO capacity.ParentQueue (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root usedCapacity=1.0390625 absoluteUsedCapacity=1.0390625 used= cluster= {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8301: Attachment: YARN-8301.002.patch > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated YARN-8542: Description: GET app/v1/services/{\{service-name}}/component-instances returns a list of containers with YARN-8299. {code:java} [ { "id": "container_1531508836237_0001_01_03", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509014497, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-1" }, { "id": "container_1531508836237_0001_01_02", "ip": "192.168.2.51", "hostname": "HW12119.local", "state": "READY", "launch_time": 1531509013492, "bare_host": "192.168.2.51", "component_instance_name": "sleeper-0" } ]{code} {{component_name}} is not part of container json, so it is hard to tell which component an instance belongs to. was: In YARN-8299, CLI for query container status is implemented to display containers in a flat list. It might be helpful to display component structure hierarchy like this: {code} [ { "name": "ping", "containers": [ { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-0", "hostname": "ping-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_02", "ip": "172.26.111.21", "launch_time": 1531767377301, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-1", "hostname": "ping-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_07", "ip": "172.26.111.21", "launch_time": 1531767410395, "state": "RUNNING_BUT_UNREADY" } ] }, { "name": "sleep", "containers": [ { "bare_host": "eyang-5.openstacklocal", "component_instance_name": "sleep-0", "hostname": "sleep-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_04", "ip": "172.26.111.20", "launch_time": 1531767377710, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "sleep-1", "hostname": "sleep-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_05", "ip": "172.26.111.21", "launch_time": 1531767378303, "state": "READY" } ] } ] {code} > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{\{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548248#comment-16548248 ] Chandni Singh commented on YARN-8542: - [~gsaha] I am not in favor of the below format: {code:java} { "name": "sleep", "containers": [ { "bare_host": "eyang-5.openstacklocal", "component_instance_name": "sleep-0", "hostname": "sleep-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_04", "ip": "172.26.111.20", "launch_time": 1531767377710, "state": "READY" } }{code} It doesn't follow the convention. The request is for containers, so it should return a list of containers. I prefer adding component_name to the container json. Also it is easy for users to further filter a flat list instead of a nested json. > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > In YARN-8299, CLI for query container status is implemented to display > containers in a flat list. It might be helpful to display component > structure hierarchy like this: > {code} > [ > { > "name": "ping", > "containers": [ > { > "bare_host": "eyang-4.openstacklocal", > "component_instance_name": "ping-0", > "hostname": "ping-0.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_02", > "ip": "172.26.111.21", > "launch_time": 1531767377301, > "state": "READY" > }, > { > "bare_host": "eyang-4.openstacklocal", > "component_instance_name": "ping-1", > "hostname": "ping-1.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_07", > "ip": "172.26.111.21", > "launch_time": 1531767410395, > "state": "RUNNING_BUT_UNREADY" > } > ] > }, > { > "name": "sleep", > "containers": [ > { > "bare_host": "eyang-5.openstacklocal", > "component_instance_name": "sleep-0", > "hostname": "sleep-0.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_04", > "ip": "172.26.111.20", > "launch_time": 1531767377710, > "state": "READY" > }, > { > "bare_host": "eyang-4.openstacklocal", > "component_instance_name": "sleep-1", > "hostname": "sleep-1.qqq.hbase.ycluster", > "id": "container_1531765479645_0002_01_05", > "ip": "172.26.111.21", > "launch_time": 1531767378303, > "state": "READY" > } > ] > } > ] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548234#comment-16548234 ] Zian Chen commented on YARN-8501: - [~snemeth] , sorry for the late review. Basically the builder is what I thought should be used to clean up the logic here. The latest patch LGTM. +1 > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548196#comment-16548196 ] Zian Chen commented on YARN-8522: - Build infrastructure is broken by HADOOP-15610. Tests will be triggered when that issues is addressed. > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Streaming Command Failed!{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration
[ https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548190#comment-16548190 ] Jonathan Hung commented on YARN-7974: - Seems related to HADOOP-15610:{noformat}Collecting typed_ast; python_version < "3.7" and implementation_name == "cpython" (from astroid>=2.0.0->pylint) Downloading https://files.pythonhosted.org/packages/52/cf/2ebc7d282f026e21eed4987e42e10964a077c13cfc168b42f3573a7f178c/typed-ast-1.1.0.tar.gz (200kB) Complete output from command python setup.py egg_info: Error: typed_ast only runs on Python 3.3 and above. Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-QpIUX5/typed-ast/ You are using pip version 8.1.1, however version 10.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. The command '/bin/sh -c pip2 install pylint' returned a non-zero code: 1 Total Elapsed time: 13m 3s ERROR: Docker failed to build image.{noformat} > Allow updating application tracking url after registration > -- > > Key: YARN-7974 > URL: https://issues.apache.org/jira/browse/YARN-7974 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-7974.001.patch, YARN-7974.002.patch, > YARN-7974.003.patch, YARN-7974.004.patch, YARN-7974.005.patch, > YARN-7974.006.patch > > > Normally an application's tracking url is set on AM registration. We have a > use case for updating the tracking url after registration (e.g. the UI is > hosted on one of the containers). > Approach is for AM to update tracking url on heartbeat to RM, and add related > API in AMRMClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8547) rm may crash if nm register with too many applications
[ https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548181#comment-16548181 ] Giovanni Matteo Fumarola commented on YARN-8547: Thanks [~sandflee] for working on this. Can you provide more details on the cause and the consequences? > rm may crash if nm register with too many applications > -- > > Key: YARN-8547 > URL: https://issues.apache.org/jira/browse/YARN-8547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee >Priority: Major > Attachments: YARN-8547.01.patch > > > 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm > may keeps 1w+ apps > 2, when rm failover, single nm register with 1w+ apps, causing active rm > always gc and lost connection with zk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548174#comment-16548174 ] Giovanni Matteo Fumarola commented on YARN-8549: Thanks [~prabham] . Can you call the patches YARN-\{Jira number}.v\{incremental number}.patch? e.g YARN-8549.v1.patch > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch > > > Stub implementation for TimeLineReader and TimeLineWriter classes. > These are useful for functional testing of writer and reader path for ATSv2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8538) Fix valgrind leak check on container executor
[ https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548169#comment-16548169 ] Billie Rinaldi commented on YARN-8538: -- Thanks [~eyang] and [~bibinchundatt]! > Fix valgrind leak check on container executor > - > > Key: YARN-8538 > URL: https://issues.apache.org/jira/browse/YARN-8538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8538.1.patch, YARN-8538.2.patch > > > Running valgrind --leak-check=yes ./cetest gives us this: > {noformat} > ==14094== LEAK SUMMARY: > ==14094== definitely lost: 964,351 bytes in 1,154 blocks > ==14094== indirectly lost: 75,506 bytes in 3,777 blocks > ==14094== possibly lost: 0 bytes in 0 blocks > ==14094== still reachable: 554 bytes in 22 blocks > ==14094== suppressed: 0 bytes in 0 blocks > ==14094== Reachable blocks (those to which a pointer was found) are not shown. > ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all > ==14094== > ==14094== For counts of detected and suppressed errors, rerun with: -v > ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8538) Fix valgrind leak check on container executor
[ https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548167#comment-16548167 ] Eric Yang commented on YARN-8538: - [~billie.rinaldi] [~bibinchundatt], I cherry-picked to branch 3.1. Thanks for the feedbacks. > Fix valgrind leak check on container executor > - > > Key: YARN-8538 > URL: https://issues.apache.org/jira/browse/YARN-8538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8538.1.patch, YARN-8538.2.patch > > > Running valgrind --leak-check=yes ./cetest gives us this: > {noformat} > ==14094== LEAK SUMMARY: > ==14094== definitely lost: 964,351 bytes in 1,154 blocks > ==14094== indirectly lost: 75,506 bytes in 3,777 blocks > ==14094== possibly lost: 0 bytes in 0 blocks > ==14094== still reachable: 554 bytes in 22 blocks > ==14094== suppressed: 0 bytes in 0 blocks > ==14094== Reachable blocks (those to which a pointer was found) are not shown. > ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all > ==14094== > ==14094== For counts of detected and suppressed errors, rerun with: -v > ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8538) Fix valgrind leak check on container executor
[ https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8538: Fix Version/s: 3.1.1 > Fix valgrind leak check on container executor > - > > Key: YARN-8538 > URL: https://issues.apache.org/jira/browse/YARN-8538 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8538.1.patch, YARN-8538.2.patch > > > Running valgrind --leak-check=yes ./cetest gives us this: > {noformat} > ==14094== LEAK SUMMARY: > ==14094== definitely lost: 964,351 bytes in 1,154 blocks > ==14094== indirectly lost: 75,506 bytes in 3,777 blocks > ==14094== possibly lost: 0 bytes in 0 blocks > ==14094== still reachable: 554 bytes in 22 blocks > ==14094== suppressed: 0 bytes in 0 blocks > ==14094== Reachable blocks (those to which a pointer was found) are not shown. > ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all > ==14094== > ==14094== For counts of detected and suppressed errors, rerun with: -v > ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548152#comment-16548152 ] genericqa commented on YARN-8549: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8549 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932091/TimeLineReaderAndWriterStubs.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21284/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch > > > Stub implementation for TimeLineReader and TimeLineWriter classes. > These are useful for functional testing of writer and reader path for ATSv2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8436) FSParentQueue: Comparison method violates its general contract
[ https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548144#comment-16548144 ] genericqa commented on YARN-8436: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8436 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932098/YARN-8436.003.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21283/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > FSParentQueue: Comparison method violates its general contract > -- > > Key: YARN-8436 > URL: https://issues.apache.org/jira/browse/YARN-8436 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-8436.001.patch, YARN-8436.002.patch, > YARN-8436.003.patch > > > The ResourceManager can fail while sorting queues if an update comes in: > {code:java} > FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > ... > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code} > The reason it breaks is a change in the sorted object itself. > This is why it fails: > * an update from a node comes in as a heartbeat. > * the update triggers a check to see if we can assign a container on the > node. > * walk over the queue hierarchy to find a queue to assign a container to: > top down. > * for each parent queue we sort the child queues in {{assignContainer}} to > decide which queue to descent into. > * we lock the parent queue when sort to prevent changes, but we do not lock > the child queues that we are sorting. > If during this sorting a different node update changes a child queue then we > allow that. This means that the objects that we are trying to sort now might > be out of order. That causes the issue with the comparator. The comparator > itself is not broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548143#comment-16548143 ] genericqa commented on YARN-8501: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 9s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8501 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12931991/YARN-8501.005.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21282/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8547) rm may crash if nm register with too many applications
[ https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548134#comment-16548134 ] genericqa commented on YARN-8547: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8547 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932057/YARN-8547.01.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21281/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > rm may crash if nm register with too many applications > -- > > Key: YARN-8547 > URL: https://issues.apache.org/jira/browse/YARN-8547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee >Priority: Major > Attachments: YARN-8547.01.patch > > > 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm > may keeps 1w+ apps > 2, when rm failover, single nm register with 1w+ apps, causing active rm > always gc and lost connection with zk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548127#comment-16548127 ] genericqa commented on YARN-8517: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 6s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8517 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932088/YARN-8517.004.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21280/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548093#comment-16548093 ] Manikandan R commented on YARN-4606: Thanks [~eepayne] for your comments. {quote} I have a general concern that these tests are not testing the fix to the starvation problem outlined in the description of this JIRA. I'm trying to determine if there is a clean way to unit test that use case. {quote} Ok. Since Active app starvation happens because of less resource allocation based on incorrect active users count, in addition to checking active users count, Can we check allocated resources for each user? Is it good enough? Earlier, resource allocation (amount of memory, vcores) should be lesser (half of the allocation with this patch based on the example given in jira description). Whereas, with this patch, it should be higher. (or) With this patch, app should complete faster than before because of proper resource allocation as expected. Can we simulate this in test cases and check the app completion time? Will take care of #2, #3 & #4. > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Manikandan R >Priority: Critical > Attachments: YARN-4606.001.patch, YARN-4606.002.patch, > YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, > YARN-4606.006.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, > YARN-4606.POC.3.patch, YARN-4606.POC.patch > > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8436) FSParentQueue: Comparison method violates its general contract
[ https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-8436: Attachment: YARN-8436.003.patch > FSParentQueue: Comparison method violates its general contract > -- > > Key: YARN-8436 > URL: https://issues.apache.org/jira/browse/YARN-8436 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-8436.001.patch, YARN-8436.002.patch, > YARN-8436.003.patch > > > The ResourceManager can fail while sorting queues if an update comes in: > {code:java} > FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > ... > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code} > The reason it breaks is a change in the sorted object itself. > This is why it fails: > * an update from a node comes in as a heartbeat. > * the update triggers a check to see if we can assign a container on the > node. > * walk over the queue hierarchy to find a queue to assign a container to: > top down. > * for each parent queue we sort the child queues in {{assignContainer}} to > decide which queue to descent into. > * we lock the parent queue when sort to prevent changes, but we do not lock > the child queues that we are sorting. > If during this sorting a different node update changes a child queue then we > allow that. This means that the objects that we are trying to sort now might > be out of order. That causes the issue with the comparator. The comparator > itself is not broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8436) FSParentQueue: Comparison method violates its general contract
[ https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548057#comment-16548057 ] Wilfred Spiegelenburg commented on YARN-8436: - 1) fixed 2) The delay is not to make sure that the comparator is called at least once but to make sure that the sorting has started. The failure will only occur if and when the sorting is progressed far enough that there is merging and or inserting elements in a sorted run. The sleep is thus not for a synchronisation but really a delay for the modifications. The countdown latch would synchronise the start but that is not what I needed. Uploading a new patch with the fixed comment > FSParentQueue: Comparison method violates its general contract > -- > > Key: YARN-8436 > URL: https://issues.apache.org/jira/browse/YARN-8436 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Attachments: YARN-8436.001.patch, YARN-8436.002.patch > > > The ResourceManager can fail while sorting queues if an update comes in: > {code:java} > FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in > handling event type NODE_UPDATE to the scheduler > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > ... > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code} > The reason it breaks is a change in the sorted object itself. > This is why it fails: > * an update from a node comes in as a heartbeat. > * the update triggers a check to see if we can assign a container on the > node. > * walk over the queue hierarchy to find a queue to assign a container to: > top down. > * for each parent queue we sort the child queues in {{assignContainer}} to > decide which queue to descent into. > * we lock the parent queue when sort to prevent changes, but we do not lock > the child queues that we are sorting. > If during this sorting a different node update changes a child queue then we > allow that. This means that the objects that we are trying to sort now might > be out of order. That causes the issue with the comparator. The comparator > itself is not broken. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548043#comment-16548043 ] Eric Yang commented on YARN-8501: - [~snemeth] Build infrastructure is broken by HADOOP-15610. Tests will be triggered when that issues is addressed. Thank you for your patience. > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Issue Type: Sub-task (was: Task) Parent: YARN-5355 > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch > > > Stub implementation for TimeLineReader and TimeLineWriter classes. > These are useful for functional testing of writer and reader path for ATSv2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Attachment: (was: TimeLineReaderAndWriterStubs.patch) > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch > > > Stub implementation for TimeLineReader and TimeLineWriter classes. > These are useful for functional testing of writer and reader path for ATSv2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Description: Stub implementation for TimeLineReader and TimeLineWriter classes. These are useful for functional testing of writer and reader path for ATSv2 > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch, > TimeLineReaderAndWriterStubs.patch > > > Stub implementation for TimeLineReader and TimeLineWriter classes. > These are useful for functional testing of writer and reader path for ATSv2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Attachment: TimeLineReaderAndWriterStubs.patch > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > Attachments: TimeLineReaderAndWriterStubs.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Attachment: 0001-Adding-stub-implementation-classes-for-TimeLineReade.patch > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
[ https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabha Manepalli updated YARN-8549: --- Attachment: (was: 0001-Adding-stub-implementation-classes-for-TimeLineReade.patch) > No operation timeline writer and reader plugin classes for ATSv2 > > > Key: YARN-8549 > URL: https://issues.apache.org/jira/browse/YARN-8549 > Project: Hadoop YARN > Issue Type: Task > Components: ATSv2, timelineclient, timelineserver >Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 >Reporter: Prabha Manepalli >Priority: Minor > Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2
Prabha Manepalli created YARN-8549: -- Summary: No operation timeline writer and reader plugin classes for ATSv2 Key: YARN-8549 URL: https://issues.apache.org/jira/browse/YARN-8549 Project: Hadoop YARN Issue Type: Task Components: ATSv2, timelineclient, timelineserver Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2 Reporter: Prabha Manepalli Fix For: YARN-2928, YARN-5355, YARN-5355_branch2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547963#comment-16547963 ] Antal Bálint Steinbach commented on YARN-8517: -- Thanks [~snemeth] . Fixed. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8517: - Attachment: YARN-8517.004.patch > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch, YARN-8517.004.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547925#comment-16547925 ] Szilard Nemeth commented on YARN-8517: -- Hi [~bsteinbach]! Thanks for the updated patch. I think one bullet point is still missing, I don't see the changes for the 5. Apart from that, the patch looks good. > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547784#comment-16547784 ] Bilwa S T commented on YARN-8548: - Thanks [~bibinchundatt] for reporting the issue. I have attched a patch. > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-8548: Attachment: YARN-8548-001.patch > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-8548-001.patch > > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T reassigned YARN-8548: --- Assignee: Bilwa S T > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
Bibin A Chundatt created YARN-8548: -- Summary: AllocationRespose proto setNMToken initBuilder not done Key: YARN-8548 URL: https://issues.apache.org/jira/browse/YARN-8548 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Distributed Scheduling allocate failing {code} Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) at org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) at org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) at org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) at org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) at org.apache.hadoop.ipc.Client.call(Client.java:1445) at org.apache.hadoop.ipc.Client.call(Client.java:1355) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy85.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done
[ https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8548: --- Target Version/s: 3.1.1 > AllocationRespose proto setNMToken initBuilder not done > --- > > Key: YARN-8548 > URL: https://issues.apache.org/jira/browse/YARN-8548 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Priority: Major > > Distributed Scheduling allocate failing > {code} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154) > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499) > at org.apache.hadoop.ipc.Client.call(Client.java:1445) > at org.apache.hadoop.ipc.Client.call(Client.java:1355) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy85.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.9.0.gpu-port.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.9.0.gpu-port.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8546) A reserved container might be released multiple times under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8546: -- Issue Type: Sub-task (was: Bug) Parent: YARN-5139 > A reserved container might be released multiple times under async scheduling > > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8547) rm may crash if nm register with too many applications
[ https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-8547: --- Attachment: YARN-8547.01.patch > rm may crash if nm register with too many applications > -- > > Key: YARN-8547 > URL: https://issues.apache.org/jira/browse/YARN-8547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee >Priority: Major > Attachments: YARN-8547.01.patch > > > 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm > may keeps 1w+ apps > 2, when rm failover, single nm register with 1w+ apps, causing active rm > always gc and lost connection with zk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8547) rm may crash if nm register with too many applications
[ https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandflee updated YARN-8547: --- Description: 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm may keeps 1w+ apps 2, when rm failover, single nm register with 1w+ apps, causing active rm always gc and lost connection with zk. was: 1, our cluster had n k+ nodes, and we disable log aggregation, single nm may keeps 1w+ apps 2, when rm failover, nm register with 1w+ apps, causing active rm always gc and lost connection with zk. > rm may crash if nm register with too many applications > -- > > Key: YARN-8547 > URL: https://issues.apache.org/jira/browse/YARN-8547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: sandflee >Priority: Major > > 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm > may keeps 1w+ apps > 2, when rm failover, single nm register with 1w+ apps, causing active rm > always gc and lost connection with zk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547591#comment-16547591 ] Antal Bálint Steinbach commented on YARN-8517: -- Hi [~snemeth] , Thanks for the review. I fixed the mentioned issues. I called the APIs from a browser and used the result for the examples. There was no such a field like "diagnosticsInfo". > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8517: - Attachment: YARN-8517.003.patch > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch, > YARN-8517.003.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented
[ https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-8517: - Attachment: YARN-8517.002.patch > getContainer and getContainers ResourceManager REST API methods are not > documented > -- > > Key: YARN-8517 > URL: https://issues.apache.org/jira/browse/YARN-8517 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Szilard Nemeth >Assignee: Antal Bálint Steinbach >Priority: Major > Labels: newbie, newbie++ > Attachments: YARN-8517.001.patch, YARN-8517.002.patch > > > Looking at the documentation here: > https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html > I cannot find documentation for 2 RM REST endpoints: > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers > - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\} > I suppose they are not intentionally undocumented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8547) rm may crash if nm register with too many applications
sandflee created YARN-8547: -- Summary: rm may crash if nm register with too many applications Key: YARN-8547 URL: https://issues.apache.org/jira/browse/YARN-8547 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee Assignee: sandflee 1, our cluster had n k+ nodes, and we disable log aggregation, single nm may keeps 1w+ apps 2, when rm failover, nm register with 1w+ apps, causing active rm always gc and lost connection with zk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7590) Improve container-executor validation check
[ https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547568#comment-16547568 ] Aljoscha Krettek commented on YARN-7590: Thanks a lot [~ebadger]! This was indeed the problem. I thought it might have been a problem with the setuid/permissions setup that's why I didn't check. FYI, this is not a production cluster but a little testing project for setting up a distributed kerberized cluster on Docker: https://github.com/aljoscha/docker-hadoop-secure-cluster. > Improve container-executor validation check > --- > > Key: YARN-7590 > URL: https://issues.apache.org/jira/browse/YARN-7590 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, yarn >Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, > 2.8.0, 2.8.1, 3.0.0-beta1 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 2.6.6, 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6 > > Attachments: YARN-7590.001.patch, YARN-7590.002.patch, > YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, > YARN-7590.006.patch, YARN-7590.007.patch, YARN-7590.008.patch, > YARN-7590.009.patch, YARN-7590.010.patch, YARN-7590.branch-2.000.patch, > YARN-7590.branch-2.6.000.patch, YARN-7590.branch-2.7.000.patch, > YARN-7590.branch-2.8.000.patch, YARN-7590.branch-2.9.000.patch > > > There is minimum check for prefix path for container-executor. If YARN is > compromised, attacker can use container-executor to change system files > ownership: > {code} > /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens > /home/spark / ls > {code} > This will change /etc to be owned by spark user: > {code} > # ls -ld /etc > drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc > {code} > Spark user can rewrite /etc files to gain more access. We can improve this > with additional check in container-executor: > # Make sure the prefix path is owned by the same user as the caller to > container-executor. > # Make sure the log directory prefix is owned by the same user as the caller. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.9.0.gpu-port.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.9.0.gpu-port.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8482) [Router] Add cache service for fast answers to getApps
[ https://issues.apache.org/jira/browse/YARN-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547484#comment-16547484 ] Dillon Zhang commented on YARN-8482: [~giovanni.fumarola] ok ~ > [Router] Add cache service for fast answers to getApps > -- > > Key: YARN-8482 > URL: https://issues.apache.org/jira/browse/YARN-8482 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7300) DiskValidator is not used in LocalDirAllocator
[ https://issues.apache.org/jira/browse/YARN-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547482#comment-16547482 ] Szilard Nemeth commented on YARN-7300: -- Hi [~haibochen]! Looks like we had some general build infrastructure issues. Could you please retrigger the build? Thanks! > DiskValidator is not used in LocalDirAllocator > -- > > Key: YARN-7300 > URL: https://issues.apache.org/jira/browse/YARN-7300 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Haibo Chen >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-7300.001.patch, YARN-7300.002.patch > > > HADOOP-13254 introduced a pluggable disk validator to replace > DiskChecker.checkDir(). However, LocalDirAllocator still references the old > DiskChecker.checkDir(). It'd be nice to > use the plugin uniformly so that user configurations take effect in all > places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart
[ https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547480#comment-16547480 ] Szilard Nemeth commented on YARN-6966: -- Hi [~rkanter]! builds.apache.org is up now. Could you please retrigger the build? Thanks! > NodeManager metrics may return wrong negative values when NM restart > > > Key: YARN-6966 > URL: https://issues.apache.org/jira/browse/YARN-6966 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-6966.001.patch, YARN-6966.002.patch, > YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch > > > Just as YARN-6212. However, I think it is not a duplicate of YARN-3933. > The primary cause of negative values is that metrics do not recover properly > when NM restart. > AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores > in metrics also need to recover when NM restart. > This should be done in ContainerManagerImpl#recoverContainer. > The scenario could be reproduction by the following steps: > # Make sure > YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true > in NM > # Submit an application and keep running > # Restart NM > # Stop the application > # Now you get the negative values > {code} > /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics > {code} > {code} > { > name: "Hadoop:service=NodeManager,name=NodeManagerMetrics", > modelerType: "NodeManagerMetrics", > tag.Context: "yarn", > tag.Hostname: "hadoop.com", > ContainersLaunched: 0, > ContainersCompleted: 0, > ContainersFailed: 2, > ContainersKilled: 0, > ContainersIniting: 0, > ContainersRunning: 0, > AllocatedGB: 0, > AllocatedContainers: -2, > AvailableGB: 160, > AllocatedVCores: -11, > AvailableVCores: 3611, > ContainerLaunchDurationNumOps: 2, > ContainerLaunchDurationAvgTime: 6, > BadLocalDirs: 0, > BadLogDirs: 0, > GoodLocalDirsDiskUtilizationPerc: 2, > GoodLogDirsDiskUtilizationPerc: 2 > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547477#comment-16547477 ] Szilard Nemeth edited comment on YARN-8501 at 7/18/18 7:23 AM: --- Hi [~eyang]! Looks like we had/have some build infrastructure issues. Could you please retrigger the build? I'm not sure what kind of build issues we had or we still have that. was (Author: snemeth): Hi [~eyang]! Could you please retrigger the build? I'm not sure what kind of build issues we had or we still have that. > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method
[ https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547477#comment-16547477 ] Szilard Nemeth commented on YARN-8501: -- Hi [~eyang]! Could you please retrigger the build? I'm not sure what kind of build issues we had or we still have that. > Reduce complexity of RMWebServices' getApps method > -- > > Key: YARN-8501 > URL: https://issues.apache.org/jira/browse/YARN-8501 > Project: Hadoop YARN > Issue Type: Improvement > Components: restapi >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-8501.001.patch, YARN-8501.002.patch, > YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.9.0.gpu-port.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: (was: hadoop-2.9.0.gpu-port.patch) > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8546) A reserved container might be released multiple times under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-8546: - Assignee: Tao Yang > A reserved container might be released multiple times under async scheduling > > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8546) A reserved container might be released multiple times under async scheduling
Weiwei Yang created YARN-8546: - Summary: A reserved container might be released multiple times under async scheduling Key: YARN-8546 URL: https://issues.apache.org/jira/browse/YARN-8546 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.1.0 Reporter: Weiwei Yang I was able to reproduce this issue by starting a job, and this job keeps requesting containers until it uses up cluster available resource. My cluster has 70200 vcores, and each task it applies for 100 vcores, I was expecting total 702 containers can be allocated but eventually there was only 701. The last container could not get allocated because queue used resource is updated to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8544) [DS] AM registration fails when hadoop authorization is enabled
[ https://issues.apache.org/jira/browse/YARN-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546774#comment-16546774 ] Bibin A Chundatt edited comment on YARN-8544 at 7/18/18 6:03 AM: - [~subru] / [~cheersyang] Could you please help to review. was (Author: bibinchundatt): [~subru] Could you please help to review. > [DS] AM registration fails when hadoop authorization is enabled > --- > > Key: YARN-8544 > URL: https://issues.apache.org/jira/browse/YARN-8544 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8544.001.patch > > > Application master fails to register when hadoop authorization is enabled. > DistributedSchedulingAMProtocol connection authorization fails are RM side > Issue credits: [~BilwaST] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org