[jira] [Commented] (YARN-8830) SLS tool not working in trunk
[ https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631390#comment-16631390 ] Hadoop QA commented on YARN-8830: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-tools/hadoop-sls: The patch generated 4 new + 7 unchanged - 0 fixed = 11 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 15s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8830 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941569/YARN-8830.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fc050ee2317c 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5c8d907 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21998/artifact/out/diff-checkstyle-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21998/testReport/ | | Max. process+thread count | 448 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output |
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631387#comment-16631387 ] Bibin A Chundatt commented on YARN-8829: Committing patch shortly. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631377#comment-16631377 ] Hadoop QA commented on YARN-8829: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 34s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8829 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941526/YARN-8829.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 572be2144271 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5c8d907 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21997/testReport/ | | Max. process+thread count | 706 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21997/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Cluster metrics can fail with IndexOutOfBound Exception >
[jira] [Assigned] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy
[ https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang reassigned YARN-8792: - Assignee: Shuai Zhang > Revisit FairScheduler QueuePlacementPolicy > --- > > Key: YARN-8792 > URL: https://issues.apache.org/jira/browse/YARN-8792 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > > Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There > are several problems: > # The termination of the responsibility chain should bind to the assigning > result instead of the rule. > # It should provide a reason when rejecting a request. > # Still need more useful rules: > ## RejectNonLeafQueue > ## RejectDefaultQueue > ## RejectUsers > ## RejectQueues > ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8794) QueuePlacementPolicy add more rules
[ https://issues.apache.org/jira/browse/YARN-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang reassigned YARN-8794: - Assignee: Shuai Zhang > QueuePlacementPolicy add more rules > --- > > Key: YARN-8794 > URL: https://issues.apache.org/jira/browse/YARN-8794 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8794.001.patch, YARN-8794.002.patch > > > Still need more useful rules: > # RejectNonLeafQueue > # RejectDefaultQueue > # RejectUsers > # RejectQueues > # DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8795) QueuePlacementRule move to separate files
[ https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang reassigned YARN-8795: - Assignee: Shuai Zhang > QueuePlacementRule move to separate files > - > > Key: YARN-8795 > URL: https://issues.apache.org/jira/browse/YARN-8795 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8795.002.patch, YARN-8795.003.patch, > YARN-8795.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8793) QueuePlacementPolicy bind more information to assigning result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Zhang reassigned YARN-8793: - Assignee: Shuai Zhang > QueuePlacementPolicy bind more information to assigning result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Assignee: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch, YARN-8793.002.patch, > YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, > YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer
[ https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631364#comment-16631364 ] Hadoop QA commented on YARN-8760: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 41s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-8760 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12941607/YARN-8760.v1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 23dee196230c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5c8d907 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Comment Edited] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631359#comment-16631359 ] Arun Suresh edited comment on YARN-8808 at 9/28/18 5:15 AM: bq. containersUtilization and nodeUtilization in SchedulerNode are always instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so getNodeUtilization() / getAggregatedContainersUtilization() should never return null, unless I am missing somthing Again, I noticed the NPE in a testcase setup and I had to put the check. It is possible it might not happen in a real cluster setup (and also my branch was a bit stale). Maybe a good Idea to just put the check in there just as a safety. Am +1 on the patch otherwise was (Author: asuresh): bq. containersUtilization and nodeUtilization in SchedulerNode are always instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so getNodeUtilization() / getAggregatedContainersUtilization() should never return null, unless I am missing somthing Again, I noticed the NPE in a testcase setup and I had to put the check. It is possible it might not happen in a real cluster. > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631359#comment-16631359 ] Arun Suresh commented on YARN-8808: --- bq. containersUtilization and nodeUtilization in SchedulerNode are always instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so getNodeUtilization() / getAggregatedContainersUtilization() should never return null, unless I am missing somthing Again, I noticed the NPE in a testcase setup and I had to put the check. It is possible it might not happen in a real cluster. > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631358#comment-16631358 ] Arun Suresh commented on YARN-8808: --- bq. The scheduler would see this 'Node' has 10GBs to allocate because that's what NM tells RM. I believe in this case, YARN should try to fully utilize just 10GBs instead of the whole node (100 GBs), because YARN is entitled to use only 10GBs. If 10GBs is indeed fully utilized, the aggregate container utilization is 100%, but the nodeUtilization is 10% (Again, node utilization by default is detected by some plugin on NM side that reads from /proc and sees the remaining system-wide 90GBs as available). Don't think we shall check if nodeUtilization is low. Makes sense.. to be honest, in my testing I had also changed it from nodeUtilization to aggregateContainerUtilization :) I was just wondering if there is still a case where we might need to factor in nodeUtilization. > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631355#comment-16631355 ] Rohith Sharma K S commented on YARN-8270: - I back ported to branch-3.1 also. Updated the Fix Version. > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-8270: Fix Version/s: 3.1.2 > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631353#comment-16631353 ] Rohith Sharma K S commented on YARN-8270: - [~vrushalic] I updated the fix version to 3.2.0. Patch is committed to trunk only which corresponding version is 3.2.0 > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-8270: Fix Version/s: (was: 3.1.2) 3.2.0 > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8832) Review of RMCommunicator Class
[ https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8832: - Assignee: BELUGA BEHR > Review of RMCommunicator Class > -- > > Key: YARN-8832 > URL: https://issues.apache.org/jira/browse/YARN-8832 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-88321.patch > > > Various improvements to the {{RMCommunicator}} class. > > * Use SLF4J parameterized logging > * Use switch statement instead of {{if}}-{{else statements}} > * Remove anti-pattern of "log and throw" (just throw) > * Use a flag to stop thread instead of an interrupt (it may be interrupting > the heartbeat code and not the thread loop) > * The main thread loops performs loops on the heartbeat callback queue until > the queue is empty. It's technically possible that other threads could > constantly put new callbacks into the queue and therefore the main thread > never progresses past the callbacks. Put a cap on the number of callbacks > that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8832) Review of RMCommunicator Class
[ https://issues.apache.org/jira/browse/YARN-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8832: -- Attachment: YARN-88321.patch > Review of RMCommunicator Class > -- > > Key: YARN-8832 > URL: https://issues.apache.org/jira/browse/YARN-8832 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: YARN-88321.patch > > > Various improvements to the {{RMCommunicator}} class. > > * Use SLF4J parameterized logging > * Use switch statement instead of {{if}}-{{else statements}} > * Remove anti-pattern of "log and throw" (just throw) > * Use a flag to stop thread instead of an interrupt (it may be interrupting > the heartbeat code and not the thread loop) > * The main thread loops performs loops on the heartbeat callback queue until > the queue is empty. It's technically possible that other threads could > constantly put new callbacks into the queue and therefore the main thread > never progresses past the callbacks. Put a cap on the number of callbacks > that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8832) Review of RMCommunicator Class
BELUGA BEHR created YARN-8832: - Summary: Review of RMCommunicator Class Key: YARN-8832 URL: https://issues.apache.org/jira/browse/YARN-8832 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR Various improvements to the {{RMCommunicator}} class. * Use SLF4J parameterized logging * Use switch statement instead of {{if}}-{{else statements}} * Remove anti-pattern of "log and throw" (just throw) * Use a flag to stop thread instead of an interrupt (it may be interrupting the heartbeat code and not the thread loop) * The main thread loops performs loops on the heartbeat callback queue until the queue is empty. It's technically possible that other threads could constantly put new callbacks into the queue and therefore the main thread never progresses past the callbacks. Put a cap on the number of callbacks that will be processed in any iteration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy
[ https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631207#comment-16631207 ] Yufei Gu commented on YARN-8792: [~HCOONa], I've added you as a contributor, so that you can assign these jiras to yourself. > Revisit FairScheduler QueuePlacementPolicy > --- > > Key: YARN-8792 > URL: https://issues.apache.org/jira/browse/YARN-8792 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > > Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There > are several problems: > # The termination of the responsibility chain should bind to the assigning > result instead of the rule. > # It should provide a reason when rejecting a request. > # Still need more useful rules: > ## RejectNonLeafQueue > ## RejectDefaultQueue > ## RejectUsers > ## RejectQueues > ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8831) Review of LocalContainerAllocator
BELUGA BEHR created YARN-8831: - Summary: Review of LocalContainerAllocator Key: YARN-8831 URL: https://issues.apache.org/jira/browse/YARN-8831 Project: Hadoop YARN Issue Type: Improvement Components: applications Affects Versions: 3.2.0 Reporter: BELUGA BEHR Attachments: YARN-8831.1.patch Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8831: -- Priority: Trivial (was: Minor) > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR reassigned YARN-8831: - Assignee: BELUGA BEHR > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8831) Review of LocalContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8831: -- Attachment: YARN-8831.1.patch > Review of LocalContainerAllocator > - > > Key: YARN-8831 > URL: https://issues.apache.org/jira/browse/YARN-8831 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Priority: Trivial > Attachments: YARN-8831.1.patch > > > Some trivial cleanup of class {{LocalContainerAllocator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader
[ https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631181#comment-16631181 ] Hudson commented on YARN-8270: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15069 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15069/]) YARN-8270 Adding JMX Metrics for Timeline Collector and Reader. (vrushali: rev 90e2e493b3dc8be54f655b957b98a4bc0e003684) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/reader/TimelineReaderWebServices.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/package-info.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/TimelineReaderMetrics.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/reader/TestTimelineReaderMetrics.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/metrics/PerNodeAggTimelineCollectorMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestPerNodeAggTimelineCollectorMetrics.java > Adding JMX Metrics for Timeline Collector and Reader > > > Key: YARN-8270 > URL: https://issues.apache.org/jira/browse/YARN-8270 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2, timelineserver >Reporter: Sushil Ks >Assignee: Sushil Ks >Priority: Major > Fix For: 3.1.2 > > Attachments: YARN-8270.001.patch, YARN-8270.002.patch, > YARN-8270.003.patch, YARN-8270.004.patch, YARN-8270.005.patch > > > This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and > Timeline Reader, basically for Timeline Collector it tries to capture success > and failure latencies for *putEntities* and *putEntitiesAsync* from > *TimelineCollectorWebService* , similarly all the API's success and failure > latencies for fetching TimelineEntities from *TimelineReaderWebServices*. > This would actually help in monitoring and measuring performance for ATSv2 at > scale. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3525) Rename fair scheduler properties increment-allocation-mb and increment-allocation-vcores
[ https://issues.apache.org/jira/browse/YARN-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631174#comment-16631174 ] Haibo Chen commented on YARN-3525: -- I'm not sure if we still need to address this. The two properties have now been deprecated by yarn.resource-types.memory-mb.increment-allocation and yarn.resource-types.vcores.increment-allocation after resource type is merged. Because cpu and vcores are two built-in resource types, people should use the resource-types-prefixed configuration properties, which are scheduler agnostic. > Rename fair scheduler properties increment-allocation-mb and > increment-allocation-vcores > > > Key: YARN-3525 > URL: https://issues.apache.org/jira/browse/YARN-3525 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Zoltan Siegl >Priority: Minor > Attachments: YARN-3525.001.patch, YARN-3525.003.patch > > > Rename below two properties since only used by fair scheduler > {color:blue}yarn.scheduler.increment-allocation-mb{color} to > {color:red}yarn.scheduler.fair.increment-allocation-mb{color} > {color:blue}yarn.scheduler.increment-allocation-vcores{color} to > {color:red}yarn.scheduler.fair.increment-allocation-vcores{color} > All other properties only for fair scheduler are using {color:red} > yarn.scheduler.fair{color} prefix . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631145#comment-16631145 ] Arun Suresh commented on YARN-8808: --- bq. Node has capacity for 4 1GB containers, but is currently running 2 containers each using more than 1.9GB - in this case, overallocation should be allowed. sorry, I meant 2 container each allocated 1 GB but using 0.9GB. This would result in the container utilization = 90% (1.8 / 2.0) but the Node utilization would be = 45% (1.8 / 4.0). Here Container Utilization is high by node utilization is low. > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8732) Add unit tests of min/max allocation for custom resource types in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8732: - Summary: Add unit tests of min/max allocation for custom resource types in FairScheduler (was: Add unit tests of min/max allocation for FairScheduler) > Add unit tests of min/max allocation for custom resource types in > FairScheduler > --- > > Key: YARN-8732 > URL: https://issues.apache.org/jira/browse/YARN-8732 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8732.001.patch > > > Create testcase like this, but for FS: > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService#testValidateRequestCapacityAgainstMinMaxAllocationFor3rdResourceTypes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer
[ https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8760: --- Attachment: YARN-8760.v1.patch > [AMRMProxy] Fix concurrent re-register due to YarnRM failover in > AMRMClientRelayer > -- > > Key: YARN-8760 > URL: https://issues.apache.org/jira/browse/YARN-8760 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8760.v1.patch > > > When home YarnRM is failing over, FinishApplicationMaster call from AM can > have multiple retry threads outstanding in FederationInterceptor. When new > YarnRM come back up, all retry threads will re-register to YarnRM. The first > one will succeed but the rest will get "Application Master is already > registered" exception. We should catch and swallow this exception and move > on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8732) Add unit tests of min/max allocation for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-8732: - Summary: Add unit tests of min/max allocation for FairScheduler (was: Create new testcase in TestApplicationMasterService that tests min/max allocation but for FairScheduler) > Add unit tests of min/max allocation for FairScheduler > -- > > Key: YARN-8732 > URL: https://issues.apache.org/jira/browse/YARN-8732 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-8732.001.patch > > > Create testcase like this, but for FS: > org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService#testValidateRequestCapacityAgainstMinMaxAllocationFor3rdResourceTypes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
[ https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631136#comment-16631136 ] Vrushali C commented on YARN-8826: -- None of the pre commit builds are running. All are failing with could not apply patch to trunk > Fix lingering timeline collector after serviceStop in TimelineCollectorManager > -- > > Key: YARN-8826 > URL: https://issues.apache.org/jira/browse/YARN-8826 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Trivial > Attachments: YARN-8826.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8826) Fix lingering timeline collector after serviceStop in TimelineCollectorManager
[ https://issues.apache.org/jira/browse/YARN-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631133#comment-16631133 ] Vrushali C commented on YARN-8826: -- Hi [~prabham] Here is the precommit build link https://builds.apache.org/job/PreCommit-YARN-Build/ I have retriggered the build: https://builds.apache.org/job/PreCommit-YARN-Build/21991/ > Fix lingering timeline collector after serviceStop in TimelineCollectorManager > -- > > Key: YARN-8826 > URL: https://issues.apache.org/jira/browse/YARN-8826 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Prabha Manepalli >Assignee: Prabha Manepalli >Priority: Trivial > Attachments: YARN-8826.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
[ https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631131#comment-16631131 ] Haibo Chen commented on YARN-8775: -- Thanks [~bsteinbach] for the patch. I think we can reduce the leak of LocalDirsHandlerService implementation details in TestDiskFailures, by disabling the periodical health check in LocalDirHandlerService, and calling LocalDirsHandlerService.checkDirs() every time before we check verify disk health. checkDirs() is currently private, so we'll need to make it public (make sure to add '@VisibleForTesting') One question I have is why do we need to retry inside prepareDirToFail()? > TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File > modifications > -- > > Key: YARN-8775 > URL: https://issues.apache.org/jira/browse/YARN-8775 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 3.0.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-8775.001.patch, YARN-8775.002.patch > > > The test can fail sometimes when file operations were done during the check > done by the thread in _LocalDirsHandlerService._ > {code:java} > java.lang.AssertionError: NodeManager could not identify disk failure. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202) > at > org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99) > Stderr > 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > to fail. > 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures > (TestDiskFailures.java:prepareDirToFail(277)) - Prepared > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > to fail. > 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] > nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - > Directory > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1 > error, Not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1, > removing from list of valid directories > 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] > localizer.ResourceLocalizationService > (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not > initialize log dir > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > java.io.FileNotFoundException: Destination exists and is not a directory: > /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3 > at > org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515) > at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496) > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324) > at >
[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated YARN-8789: -- Attachment: YARN-8789.10.patch > Add BoundedQueue to AsyncDispatcher > --- > > Key: YARN-8789 > URL: https://issues.apache.org/jira/browse/YARN-8789 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications >Affects Versions: 3.2.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Major > Attachments: YARN-8789.1.patch, YARN-8789.10.patch, > YARN-8789.2.patch, YARN-8789.3.patch, YARN-8789.4.patch, YARN-8789.5.patch, > YARN-8789.6.patch, YARN-8789.7.patch, YARN-8789.7.patch, YARN-8789.8.patch, > YARN-8789.9.patch > > > I recently came across a scenario where an MR ApplicationMaster was failing > with an OOM exception. It had many thousands of Mappers and thousands of > Reducers. It was noted that in the logging that the event-queue of > {{AsyncDispatcher}} had a very large number of item in it and was seemingly > never decreasing. > I started looking at the code and thought it could use some clean up, > simplification, and the ability to specify a bounded queue so that any > incoming events are throttled until they can be processed. This will protect > the ApplicationMaster from a flood of events. > Logging Message: > Size of event-queue is xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assigning result
[ https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8793: - Summary: QueuePlacementPolicy bind more information to assigning result (was: QueuePlacementPolicy bind more information to assgining result) > QueuePlacementPolicy bind more information to assigning result > -- > > Key: YARN-8793 > URL: https://issues.apache.org/jira/browse/YARN-8793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > Attachments: YARN-8793.001.patch, YARN-8793.002.patch, > YARN-8793.003.patch, YARN-8793.004.patch, YARN-8793.005.patch, > YARN-8793.006.patch, YARN-8793.007.patch, YARN-8793.008.patch > > > Fair scheduler's QueuePlacementPolicy should bind more information to > assigning result: > # Whether to terminate the chain of responsibility > # The reason to reject a request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy
[ https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8792: - Summary: Revisit FairScheduler QueuePlacementPolicy (was: Revisit FiarScheduler QueuePlacementPolicy ) > Revisit FairScheduler QueuePlacementPolicy > --- > > Key: YARN-8792 > URL: https://issues.apache.org/jira/browse/YARN-8792 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > > Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There > are several problems: > # The termination of the responsibility chain should bind to the assigning > result instead of the rule. > # It should provide a reason when rejecting a request. > # Still need more useful rules: > ## RejectNonLeafQueue > ## RejectDefaultQueue > ## RejectUsers > ## RejectQueues > ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631083#comment-16631083 ] Eric Yang commented on YARN-8734: - Precommit build failed due to other Jenkins problem. Trigger the test job again. > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch, > YARN-8734.006.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631043#comment-16631043 ] Haibo Chen commented on YARN-8808: -- containersUtilization and nodeUtilization in SchedulerNode are always instantiated (to ResourceUtilization.newInstance(0, 0, 0f)), so getNodeUtilization() / getAggregatedContainersUtilization() should never return null, unless I am missing somthing. > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription
[ https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631039#comment-16631039 ] Haibo Chen commented on YARN-8808: -- Ah.. Got you. {quote}I was just saying.. we need an additional check to see if either one of them (you are proposing to use the former in this JIRA) is {{0}} {quote} Not sure checking nodeUtilization makes sense to me. Let's take a more extreme case for example: 1) A node (hardware) has 100 GB capacity, and we're sharing the node with other YARN stuff, so we'd configure NMs to limit aggregate container allocation on each node to 10 GBs. 2) The scheduler would see this 'Node' has 10GBs to allocate because that's what NM tells RM. I believe in this case, YARN should try to fully utilize just 10GBs instead of the whole node (100 GBs), because YARN is entitled to use only 10GBs. If 10GBs is indeed fully utilized, the aggregate container utilization is 100%, but the nodeUtilization is 10% (Again, node utilization by default is detected by some plugin on NM side that reads from /proc and sees the remaining system-wide 90GBs as available). Don't think we shall check if nodeUtilization is low. {quote}- Node has capacity for 4 1GB containers, but is currently running 2 containers each using more than 1.9GB - in this case, overallocation should be allowed. {quote} I am not following here. Node has a capacity of 4GBs, 2 containers each using 1.9GB, so the aggregate container utilization and node utilization are both high, no? Node capacity and utilization don't have anything to do with # of containers, do they? > Use aggregate container utilization instead of node utilization to determine > resources available for oversubscription > - > > Key: YARN-8808 > URL: https://issues.apache.org/jira/browse/YARN-8808 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-1011 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-8088-YARN-1011.01.patch, > YARN-8808-YARN-1011.00.patch > > > Resource oversubscription should be bound to the amount of the resources that > can be allocated to containers, hence the allocation threshold should be with > respect to aggregate container utilization. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630977#comment-16630977 ] Hudson commented on YARN-6456: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15068 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15068/]) YARN-6456. Added config to set default container runtimes. (eyang: rev b237a0dd44ab285941983648d7ef26b99b30d624) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/TestGpuResourceHandler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/gpu/GpuResourceHandlerImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java > Allow administrators to control available container runtimes and set defaults > for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630930#comment-16630930 ] Eric Yang commented on YARN-6456: - +1 looks good to me for patch 005, will commit shortly. > Allow administrators to control available container runtimes and set defaults > for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630892#comment-16630892 ] Craig Condit commented on YARN-6456: [~eyang], title updated. > Allow administrators to control available container runtimes and set defaults > for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6456) Allow administrators to control available container runtimes and set defaults for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Condit updated YARN-6456: --- Summary: Allow administrators to control available container runtimes and set defaults for all containers (was: Allow administrators to set a single ContainerRuntime for all containers) > Allow administrators to control available container runtimes and set defaults > for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630857#comment-16630857 ] Wangda Tan commented on YARN-8800: -- Found the images are problematic because of the size. Removed all images, verified latest patch, should work now. [~sunilg] mind to review again? > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch, YARN-8800.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8800: - Attachment: YARN-8800.005.patch > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch, YARN-8800.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630838#comment-16630838 ] Eric Yang commented on YARN-6456: - The title doesn't match the implementation though. The implementation is allowing more than one runtime, and set one, if it is not explicitly defined. Where the title says to enforce all container to run with one runtime. The feature requested by the title can be accomplished by using existing yarn.nodemanager.runtime.linux.allowed-runtimes setting and set it to one runtime without code change. Do we want to change the title to reflect the implementation for correctness? > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630814#comment-16630814 ] Giovanni Matteo Fumarola commented on YARN-8829: +1. LGTM. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630787#comment-16630787 ] Eric Yang edited comment on YARN-6456 at 9/27/18 5:38 PM: -- [~ccondit-target] Thank you for the explanation. In YARN service, user does not require to specify the runtime, and it will run the DefaultLinuxContainerRuntime because it is the default. I understand more clearly that this JIRA is to focus on setting a default other than DefaultLinuxContainerRuntime, when user does not specify the default. User who is using YARN service LLAP application which did not specify the default, and system administrator override the default. It might break existing users because the behavior has changed. The responsibility of the compatibility breakage depends on the system administrator rather than code change. Hence, this change is backward compatible by default, and give system admin more control to steer the developer to work with specific runtime. The documentation can explain this more clearly by showing yarn.nodemanager.runtime.linux.type and yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or find this JIRA for the explanation. I think the patch is ready. was (Author: eyang): [~ccondit-target] Thank you for the explanation. In YARN service, user does not require to specify the runtime, and it will run the DefaultLinuxContainerRuntime because it is the default. I understand more clearly that this JIRA is to focus on setting a default other than DefaultLinuxContainerRuntime, when user does not specify the default. User who is using YARN service LLAP application which did not specify the default, and system administrator override the default. It might break existing users because the behavior has changed. The responsibility of the compatibility breakage depends on the system administrator rather than code change. Hence, this change is backward compatible by default, and give system admin more control to steer the developer to work with specific runtime. The documentation can explain this more clearly by showing yarn.nodemanager.runtime.linux.type and yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or find this JIRA for the explaination. I think the patch is ready. > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630787#comment-16630787 ] Eric Yang commented on YARN-6456: - [~ccondit-target] Thank you for the explanation. In YARN service, user does not require to specify the runtime, and it will run the DefaultLinuxContainerRuntime because it is the default. I understand more clearly that this JIRA is to focus on setting a default other than DefaultLinuxContainerRuntime, when user does not specify the default. User who is using YARN service LLAP application which did not specify the default, and system administrator override the default. It might break existing users because the behavior has changed. The responsibility of the compatibility breakage depends on the system administrator rather than code change. Hence, this change is backward compatible by default, and give system admin more control to steer the developer to work with specific runtime. The documentation can explain this more clearly by showing yarn.nodemanager.runtime.linux.type and yarn.nodemanager.runtime.linux.allowed-runtimes side by side for clarity, or find this JIRA for the explaination. I think the patch is ready. > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8830) SLS tool not working in trunk
[ https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt reassigned YARN-8830: -- Assignee: Bibin A Chundatt > SLS tool not working in trunk > - > > Key: YARN-8830 > URL: https://issues.apache.org/jira/browse/YARN-8830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8830.001.patch > > > Seems NodeDetails hashCode() and equals() causing too many node registration > for large data set -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8830) SLS tool not working in trunk
[ https://issues.apache.org/jira/browse/YARN-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8830: --- Attachment: YARN-8830.001.patch > SLS tool not working in trunk > - > > Key: YARN-8830 > URL: https://issues.apache.org/jira/browse/YARN-8830 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Priority: Major > Attachments: YARN-8830.001.patch > > > Seems NodeDetails hashCode() and equals() causing too many node registration > for large data set -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
[ https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630739#comment-16630739 ] Botong Huang commented on YARN-8696: Thanks [~giovanni.fumarola] for the review and commit! > [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async > --- > > Key: YARN-8696 > URL: https://issues.apache.org/jira/browse/YARN-8696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, > YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, > YARN-8696.v5.patch, YARN-8696.v6.patch > > > Today in _FederationInterceptor_, the heartbeat to home sub-cluster is > synchronous. After the heartbeat is sent out to home sub-cluster, it waits > for the home response to come back before merging and returning the (merged) > heartbeat result to back AM. If home sub-cluster is suffering from connection > issues, or down during an YarnRM master-slave switch, all heartbeat threads > in _FederationInterceptor_ will be blocked waiting for home response. As a > result, the successful UAM heartbeats from secondary sub-clusters will not be > returned to AM at all. Additionally, because of the fact that we kept the > same heartbeat responseId between AM and home RM, lots of tricky handling are > needed regarding the responseId resync when it comes to > _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart > (YARN-6127, YARN-1336), home RM master-slave switch etc. > In this patch, we change the heartbeat to home sub-cluster to asynchronous, > same as the way we handle UAM heartbeats in secondaries. So that any > sub-cluster down or connection issues won't impact AM getting responses from > other sub-clusters. The responseId is also managed separately for home > sub-cluster and AM, and they increment independently. The resync logic > becomes much cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8819) Fix findbugs warnings in YarnServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630708#comment-16630708 ] Vidura Bhathiya Mudalige commented on YARN-8819: [~ajisakaa], can you please assign this Jira to me? > Fix findbugs warnings in YarnServiceUtils > - > > Key: YARN-8819 > URL: https://issues.apache.org/jira/browse/YARN-8819 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine > > org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String, > int, String) concatenates strings using + in a loop At > YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 123] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8819) Fix findbugs warnings in YarnServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630675#comment-16630675 ] ASF GitHub Bot commented on YARN-8819: -- GitHub user vbmudalige opened a pull request: https://github.com/apache/hadoop/pull/419 YARN-8819. Fix findbugs warnings in YarnServiceUtils You can merge this pull request into a Git repository by running: $ git pull https://github.com/vbmudalige/hadoop YARN-8819 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/419.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #419 commit 57c872eb8bb7dd69599e899876417cc541018f0d Author: Vidura Mudalige Date: 2018-09-27T16:10:13Z YARN-8819. Fix findbugs warnings in YarnServiceUtils > Fix findbugs warnings in YarnServiceUtils > - > > Key: YARN-8819 > URL: https://issues.apache.org/jira/browse/YARN-8819 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine > > org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String, > int, String) concatenates strings using + in a loop At > YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 123] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8303) YarnClient should contact TimelineReader for application/attempt/container report
[ https://issues.apache.org/jira/browse/YARN-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8303: Attachment: YARN-8303.001.patch > YarnClient should contact TimelineReader for application/attempt/container > report > - > > Key: YARN-8303 > URL: https://issues.apache.org/jira/browse/YARN-8303 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Critical > Attachments: YARN-8303.001.patch, YARN-8303.poc.patch > > > YarnClient get app/attempt/container information from RM. If RM doesn't have > then queried to ahsClient. When ATSv2 is only enabled, yarnClient will result > empty. > YarnClient is used by many users which result in empty information for > app/attempt/container report. > Proposal is to have adapter from yarn client so that app/attempt/container > reports can be generated from AHSv2Client which does REST API to > TimelineReader and get the entity and convert it into app/attempt/container > report. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8830) SLS tool not working in trunk
Bibin A Chundatt created YARN-8830: -- Summary: SLS tool not working in trunk Key: YARN-8830 URL: https://issues.apache.org/jira/browse/YARN-8830 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Seems NodeDetails hashCode() and equals() causing too many node registration for large data set -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8827) Plumb per app, per user and per queue resource utilization from the NM to RM
[ https://issues.apache.org/jira/browse/YARN-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630610#comment-16630610 ] Arun Suresh commented on YARN-8827: --- I guess we just need per app utilization.. since the queue and user etc can be derived at the RM. > Plumb per app, per user and per queue resource utilization from the NM to RM > > > Key: YARN-8827 > URL: https://issues.apache.org/jira/browse/YARN-8827 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun Suresh >Assignee: Arun Suresh >Priority: Major > > Opportunistic Containers for OverAllocation need to be allocated to pending > applications in some fair manner. Rather than evaluating queue and user > resource usage (allocated resource usage) and comparing against queue and > user limits to decide the allocation, it might make more sense to use a > snapshot of actual resource utilization of the queue and user. > To facilitate this, this JIRA proposes to aggregate per user, per app (and > maybe per queue) resource utilization in addition to aggregated Container and > Node Utilization and send it along with the NM heartbeat. It should be fairly > inexpensive to aggregate - since it can be performed in the same loop of the > {{ContainersMonitorImpl}}'s Monitoring thread. > A snapshot aggregate can be made every couple of seconds in the RM. This > instantaneous resource utilization should be used to decide if Opportunistic > containers can be allocated to an App, Queue or User. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630553#comment-16630553 ] Sunil Govindan commented on YARN-8800: -- Yes. I was also seeing that. Images were not coming. contents are all good. > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630553#comment-16630553 ] Sunil Govindan edited comment on YARN-8800 at 9/27/18 2:52 PM: --- Yes. I was also seeing that. Images were not coming, i thought some pblm is my setup. :) contents are all good. was (Author: sunilg): Yes. I was also seeing that. Images were not coming. contents are all good. > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.
[ https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630534#comment-16630534 ] Wangda Tan commented on YARN-8800: -- Please hold on, there're a few issues of the images, let me fix it and upload the patch by today. > Updated documentation of Submarine with latest examples. > > > Key: YARN-8800 > URL: https://issues.apache.org/jira/browse/YARN-8800 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-8800.001.patch, YARN-8800.002.patch, > YARN-8800.003.patch, YARN-8800.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630442#comment-16630442 ] Weiwei Yang commented on YARN-8468: --- Hi [~bsteinbach] I've read the patch again, it almost looks fine to me. Just one more thing: I think we can remove {{TestAllocationFileLoaderService#initResourceTypes}}, and replace that with {code} TestResourceUtils.addNewTypesToResources(A_CUSTOM_RESOURCE); {code} Some other minor issues: TestRMServerUtils: line 66, 105, 107 exceeds 80 char limit, and some checkstyle issues in {{TestApplicationMasterServiceWithFS}} too, see more in the jenkins report. You can try fix as much as you can. Thanks > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630368#comment-16630368 ] Weiwei Yang commented on YARN-8468: --- Hi [~bsteinbach] Thanks for taking efforts on investigating this approach, I appreciate that. I agree most of your analysis, but some un-avoid changes were due to historic reasons. It looks like what I suggested earlier was difficult to achieve at this point, it can not make the change any simpler or less risky. Due to these facts, I agree to follow the approach in your patch, I will review the latest patch again. Thanks! > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Attachments: YARN-8468.000.patch, YARN-8468.001.patch, > YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, > YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, > YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, > YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, > YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, > YARN-8468.017.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > The goal of this ticket is to allow this value to be set on a per queue basis. > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > Suggested solution: > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability(String queueName) in both > FSParentQueue and FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * Enforce the use of queue based maximum allocation limit if it is > available, if not use the general scheduler level setting > ** Use it during validation and normalization of requests in > scheduler.allocate, app submit and resource request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers
[ https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630305#comment-16630305 ] Craig Condit commented on YARN-6456: [~eyang], your assessment of how the properties interact is correct. The {{yarn.nodemanager.runtime.linux.allowed-runtimes}} property dictates the set of runtimes which may be selected from, while {{yarn.nodemanager.runtime.linux.type}} sets the default. Without this, all application submissions would need to specify a runtime type or they would fail. This is also why a default docker image can be specified. The idea is that an administrator can allow jobs to run under any runtime without user-visible configuration. The mapping between runtime type names and classes uses the same logic as the {{YARN_CONTAINER_RUNTIME_TYPE}} environment variable (and in fact uses the same code). The value of {{yarn.nodeanager.runtime.linux.type}} is used as a default for {{YARN_CONTAINER_RUNTIME_TYPE}} if it is not provided by the user. Similarly, {{yarn.nodemanager.runtime.linux.docker.image-name}} is used as a default for {{YARN_CONTAINER_RUNTIME_DOCKER_IMAGE}}. > Allow administrators to set a single ContainerRuntime for all containers > > > Key: YARN-6456 > URL: https://issues.apache.org/jira/browse/YARN-6456 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Craig Condit >Priority: Major > Labels: Docker > Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch, > YARN-6456.001.patch, YARN-6456.002.patch, YARN-6456.003.patch, > YARN-6456.004.patch, YARN-6456.005.patch > > > > With LCE, there are multiple ContainerRuntimes available for handling > different types of containers; default, docker, java sandbox. Admins should > have the ability to override the user decision and set a single global > ContainerRuntime to be used for all containers. > Original Description: > {quote}One reason to use Docker containers is to be able to isolate different > workloads, even, if they run as the same user. > I have noticed some issues in the current design: > 1. DockerLinuxContainerRuntime mounts containerLocalDirs > {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and > userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see > and modify the files of another container. I think the application file cache > directory should be enough for the container to run in most of the cases. > 2. The whole cgroups directory is mounted. Would the container directory be > enough? > 3. There is no way to enforce exclusive use of Docker for all containers. > There should be an option that it is not the user but the admin that requires > to use Docker. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8788) mvn package -Pyarn-ui fails on JDK9
[ https://issues.apache.org/jira/browse/YARN-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630290#comment-16630290 ] Vidura Bhathiya Mudalige commented on YARN-8788: Hi [~ajisakaa], I have recreated the issue. But it seems we have to wait until wro4j 1.8.1 is released. right? > mvn package -Pyarn-ui fails on JDK9 > --- > > Key: YARN-8788 > URL: https://issues.apache.org/jira/browse/YARN-8788 > Project: Hadoop YARN > Issue Type: Bug > Environment: Java 9.0.4, CentOS 7.5 >Reporter: Akira Ajisaka >Priority: Major > Labels: newbie > > {{mvn package -Pdist,native,yarn-ui -Dtar -DskipTests}} failed on trunk. > {noformat} > [ERROR] Failed to execute goal ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run > (default) on project hadoop-yarn-ui: Execution default of goal > ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run failed: An API incompatibility was > encountered while executing ro.isdc.wro4j:wro4j-maven-plugin:1.7.9:run: > java.lang.ExceptionInInitializerError: null > [ERROR] - > [ERROR] realm =plugin>ro.isdc.wro4j:wro4j-maven-plugin:1.7.9 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/home/aajisaka/.m2/repository/ro/isdc/wro4j/wro4j-maven-plugin/1.7.9/wro4j-maven-plugin-1.7.9.jar > [ERROR] urls[1] = > file:/home/aajisaka/.m2/repository/ro/isdc/wro4j/wro4j-core/1.7.9/wro4j-core-1.7.9.jar > [ERROR] urls[2] = > file:/home/aajisaka/.m2/repository/org/apache/commons/commons-lang3/3.4/commons-lang3-3.4.jar > (snip) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630246#comment-16630246 ] Bibin A Chundatt commented on YARN-8829: [~akki261001] This is also possible if subclusters are not started also. Active subclusters are empty exception is possible. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630234#comment-16630234 ] Akshay Agarwal commented on YARN-8829: -- [~sunilg] After cluster setup, did shutdown of all the sub-clusters , in that case it is showing this behaviour. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630209#comment-16630209 ] Sunil Govindan commented on YARN-8829: -- [~akki261001] In which case "no sub clusters are available in Router Based Federation Setup" is possible? Are we missing some validation while configuring? Its interesting to see its breaking here. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630199#comment-16630199 ] Bibin A Chundatt commented on YARN-8829: +1 lgtm . Looks direct to me. Will commit once jenkins completes. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay Agarwal updated YARN-8829: - Attachment: YARN-8829.001.patch > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630195#comment-16630195 ] Akshay Agarwal commented on YARN-8829: -- Attached the patch ! Please review. > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > Attachments: YARN-8829.001.patch > > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630188#comment-16630188 ] Antal Bálint Steinbach commented on YARN-8468: -- Hi [~cheersyang], [~haibochen] , I spent some hours to dig deeper and try to do this change. I would say it is partly possible to do it. Let me tell you my findings and decide based on that. Sorry, for the long post. There are 2 things here. h3. *Normalization on 2 levels* as Weiwei called it (*#1*) RMAppManager({{ApplicationClientProtocol}}), DefaultAMSProcessor and (*#2*) {{FS/CS#allocate}} It turned out that it makes sense to do both because they are doing slightly different things. I found this after I removed #2 and there were some tests failing which were building unconsciously on this feature. #1 _RMServerUtils.normalizeAndValidateRequests(...)_ Throws an exception if the allocation is higher then the allowed one #2 _ask.setCapability(SchedulerUtils.getNormalizedResource(...))_ Sets the required allocation to the allowed maximum if it is higher than the allowed maximum or increments the required allocation to be higher than the minimum allocation if it is lower then the minimum allocation. The solution can be to merge them into level #1 as you suggested but then we have to figure out if we would like to keep the exception throwing or the fixing behavior or both (throw on higher, fix on lower) h3. *Protect Yarnscheduler* api I would say we cannot do this. * We have to calculate maxResourceAllocation for _SchedulerUtils.getNormalizedResource(...)_ for that we need the queueName. There is no workaround for that. * We would like to calculate maxResourceAllocation once per allocate(), therefore we have to calculate it before iterating over asks and do normalization like here (RMAppManager): {code:java} Resource maxAllocation = scheduler.getMaximumResourceCapability(queue); for (ResourceRequest amReq : amReqs) { SchedulerUtils.normalizeAndValidateRequest(amReq, maxAllocation, queue, scheduler, isRecovery, rmContext, null); amReq.setCapability(scheduler.getNormalizedResource( amReq.getCapability(), maxAllocation));{code} If I understand correctly your suggestion was to skip _scheduler.getNormalizedResource_ and use directly the static _SchedulerUtils.getNormalizedResource(...)_ method. In this way, we don't have to change YarnScheduler API because we don't have to pass around maxResourceAllocation. The problem with this that scheduler.getNormalizedResource is overridden in FairScheduler. FS: {code:java} @Override public Resource getNormalizedResource(Resource requestedResource, Resource maxResourceCapability) { return SchedulerUtils.getNormalizedResource(requestedResource, DOMINANT_RESOURCE_CALCULATOR, minimumAllocation, maxResourceCapability, incrAllocation); }{code} AbstractYarnScheduler: {code:java} @Override public Resource getNormalizedResource(Resource requestedResource, Resource maxResourceCapability) { return SchedulerUtils.getNormalizedResource(requestedResource, getResourceCalculator(), getMinimumResourceCapability(), maxResourceCapability, getMinimumResourceCapability()); }{code} This means that if I would like to call _SchedulerUtils.getNormalizedResource(...)_ from for example RMAppManager I still need the scheduler to get parameters like this: {code:java} Resource maxAllocation = scheduler.getMaximumResourceCapability(queue); for (ResourceRequest amReq : amReqs) { SchedulerUtils.normalizeAndValidateRequest(amReq, maxAllocation, queue, scheduler, isRecovery, rmContext, null); Resource normalizedCapability = SchedulerUtils.getNormalizedResource( amReq.getCapability(), scheduler.getResourceCalculator(), scheduler.getMinimumResourceCapability(), maxAllocation, scheduler.getIncrementAllocation()); amReq.setCapability(normalizedCapability); }{code} This means I have to introduce a new method in the YarnScheduler API called getIncrementAllocation, which is against our starting point. Also, we can delete the _getNormalizedResource_ method because it will be never used, that is another change in the API. Furthermore, _scheduler.getNormalizedResource_ is used in several tests as a mocked method. If we replace that call to a static method call we will be in trouble in those tests also. > Enable the use of queue based maximum container allocation limit and > implement it in FairScheduler > -- > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical >
[jira] [Updated] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay Agarwal updated YARN-8829: - Description: If no sub clusters are available in Router Based Federation Setup ,cluster metrics can throw "IndexOutOfBoundException". Additional check is required for sub cluster is Empty. *Exception details:* {noformat} Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) {noformat} was: If no sub clusters are available in Router Based Federation Setup ,cluster metrics can throw "IndexOutOfBoundException". Additional check is required for sub cluster is Empty. *Exception details:* Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > {noformat} > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
Akshay Agarwal created YARN-8829: Summary: Cluster metrics can fail with IndexOutOfBound Exception Key: YARN-8829 URL: https://issues.apache.org/jira/browse/YARN-8829 Project: Hadoop YARN Issue Type: Bug Reporter: Akshay Agarwal If no sub clusters are available in Router Based Federation Setup ,cluster metrics can throw "IndexOutOfBoundException". Additional check is required for sub cluster is Empty. *Exception details:* Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8829) Cluster metrics can fail with IndexOutOfBound Exception
[ https://issues.apache.org/jira/browse/YARN-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay Agarwal reassigned YARN-8829: Assignee: Akshay Agarwal > Cluster metrics can fail with IndexOutOfBound Exception > > > Key: YARN-8829 > URL: https://issues.apache.org/jira/browse/YARN-8829 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Akshay Agarwal >Assignee: Akshay Agarwal >Priority: Minor > > If no sub clusters are available in Router Based Federation Setup ,cluster > metrics can throw "IndexOutOfBoundException". > Additional check is required for sub cluster is Empty. > *Exception details:* > Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, > Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.invokeConcurrent(FederationClientInterceptor.java:654) > at > org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:604) > at > org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:232) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:253) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:585) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8828) When config ReservationSystem ,the RM start failed.
[ https://issues.apache.org/jira/browse/YARN-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yimeng updated YARN-8828: - Attachment: capacity-scheduler.xml yarn-site.xml > When config ReservationSystem ,the RM start failed. > --- > > Key: YARN-8828 > URL: https://issues.apache.org/jira/browse/YARN-8828 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: yimeng >Priority: Major > Labels: usability > Attachments: capacity-scheduler.xml, yarn-site.xml > > > I tested ReservationSystem in Hadooop 3.0,but it seems have problem. > 1.config yarn.resourcemanager.reservation-system.enable = true in RM > yarn-site.xml > 2.select a leaf queue "bbb" ,config > yarn.scheduler.capacity.root.bbb.reservable = true in > capacity-scheduler.xml,as follow > > yarn.scheduler.capacity.root.bbb.reservable > true > > 3.then restart RM ,the RM start failed .The error stack log is as follows: > 2018-09-27 11:30:15,691 | FATAL | main | Error starting ResourceManager | > ResourceManager.java:1517 > org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping > contains invalid or non-leaf queue : bbb > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:813) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1214) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:315) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1510) > Caused by: java.io.IOException: mapping contains invalid or non-leaf queue : > bbb > at > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetQueueMapping(UserGroupMappingPlacementRule.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.get(UserGroupMappingPlacementRule.java:280) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:668) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:689) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:716) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:360) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:425) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > ... 7 more > > I am sure the queue "bbb" is a leaf queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org