[jira] [Commented] (YARN-6924) Metrics for Federation AMRMProxy
[ https://issues.apache.org/jira/browse/YARN-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053882#comment-17053882 ] Hudson commented on YARN-6924: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18032 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18032/]) YARN-6924. Metrics for Federation AMRMProxy. Contributed by Young Chen (bibinchundatt: rev 3859fa76d0b5202abaf6e02fc9743684f5ab1cb2) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyMetrics.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyMetrics.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java > Metrics for Federation AMRMProxy > > > Key: YARN-6924 > URL: https://issues.apache.org/jira/browse/YARN-6924 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Young Chen >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-6924.01.patch, YARN-6924.01.patch, > YARN-6924.02.patch, YARN-6924.02.patch, YARN-6924.03.patch, > YARN-6924.04.patch, YARN-6924.05.patch > > > This JIRA proposes addition of metrics for Federation AMRMProxy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9856) Remove log-aggregation related duplicate function
[ https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053691#comment-17053691 ] Hadoop QA commented on YARN-9856: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 20 new + 12 unchanged - 1 fixed = 32 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 0s{color} | {color:red} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 2s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 44s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}117m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | | Unused public or protected field:org.apache.hadoop.yarn.logaggregation.ContainerLogReaderBase.logTypes In ContainerLogReaderBase.java | | Failed junit tests | hadoop.yarn.logaggregation.filecontroller.ifile.TestLogAggregationIndexedFileController | | | hadoop.yarn.server.nodemanager.webapp.TestNMWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9856 | | JIRA Patch URL |
[jira] [Commented] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053615#comment-17053615 ] Hadoop QA commented on YARN-10003: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 95m 4s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10003 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995876/YARN-10003.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e56fd8603b29 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25653/testReport/ | | Max. process+thread count | 835 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25653/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > YarnConfigurationStore#checkVersion throws
[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053613#comment-17053613 ] Hadoop QA commented on YARN-10168: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 0 unchanged - 2 fixed = 0 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 56s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.activities.TestActivitiesManager | | | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10168 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995875/YARN-10168-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 8d975f6b8831 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (YARN-9427) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers fails sporadically
[ https://issues.apache.org/jira/browse/YARN-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053600#comment-17053600 ] Jim Brennan commented on YARN-9427: --- [~ebadger], [~epayne] we see this failure intermittently in our internal branch-2.10 builds. Would be good to commit this if the fix looks good to you. > TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers > fails sporadically > > > Key: YARN-9427 > URL: https://issues.apache.org/jira/browse/YARN-9427 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler, test >Affects Versions: 2.10.0, 3.2.0 >Reporter: Prabhu Joseph >Assignee: Ahmed Hussein >Priority: Major > Attachments: > TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers, > YARN-9427-branch-2.10.001.patch, YARN-9427-branch-2.10.002.patch, > YARN-9427.001.patch, YARN-9427.002.patch > > > Failed > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers > {code} > java.lang.AssertionError: expected:<2> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers(TestContainerSchedulerQueuing.java:1027) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053593#comment-17053593 ] Szilard Nemeth commented on YARN-10003: --- Thanks [~bteke] for this patch, committed to trunk. Thanks [~adam.antal] for the review. [~bteke]: Could you please check if this patch is feasible to be backported to branch-3.2? Basically, you need to check if it's easily possible to cherry-pick this to branch called "branch-3.2". At first, please check if the CS Logmutation feature is present on branch-3.2. Thanks. > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch, YARN-10003.005.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10003: -- Fix Version/s: 3.3.0 > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch, YARN-10003.005.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9856) Remove log-aggregation related duplicate function
[ https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9856: - Attachment: YARN-9856.002.patch > Remove log-aggregation related duplicate function > - > > Key: YARN-9856 > URL: https://issues.apache.org/jira/browse/YARN-9856 > Project: Hadoop YARN > Issue Type: Task > Components: log-aggregation, yarn >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Szilard Nemeth >Priority: Trivial > Attachments: YARN-9856.001.patch, YARN-9856.002.patch > > > [~snemeth] has noticed a duplication in two of the log-aggregation related > functions. > {quote}I noticed duplicated code in > org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, > duplicated in > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs. > [...] > {quote} > We should remove the duplication. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2710) RM HA tests failed intermittently on trunk
[ https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053506#comment-17053506 ] Jim Brennan commented on YARN-2710: --- [~kihwal] would be good to get this committed. We are still seeing these intermittent failures on internal builds. > RM HA tests failed intermittently on trunk > -- > > Key: YARN-2710 > URL: https://issues.apache.org/jira/browse/YARN-2710 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Environment: Java 8, jenkins >Reporter: Wangda Tan >Assignee: Ahmed Hussein >Priority: Major > Attachments: TestResourceTrackerOnHA-output.2.txt, > YARN-2710-branch-2.10.001.patch, YARN-2710-branch-2.10.002.patch, > YARN-2710.001.patch, YARN-2710.002.patch, > org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt > > > Failure like, it can be happened in TestApplicationClientProtocolOnHA, > TestResourceTrackerOnHA, etc. > {code} > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA > testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA) > Time elapsed: 9.491 sec <<< ERROR! > java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 > to asf905.gq1.ygridcore.net:28032 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) > at org.apache.hadoop.ipc.Client.call(Client.java:1438) > at org.apache.hadoop.ipc.Client.call(Client.java:1399) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) > at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583) > at > org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053491#comment-17053491 ] Benjamin Teke commented on YARN-10003: -- Hi [~snemeth], Thanks for looking at the patch! # As per our offline discussion I guarded the code for nullity of currentVersion, this ensures that there are no need to override checkVersion when the getCurrentVersion gets overriden with a simple _return null._ # Because of point 1, the comment is unnecessary, so I removed it. It was just there to state, that every time when the getCurrentVersion was overridden with a _return null_, the checkVersion was also overridden to do nothing. # I added the messages to the asserts. # Also done. > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch, YARN-10003.005.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10003: - Attachment: YARN-10003.005.patch > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch, YARN-10003.005.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10183) Auto Created Leaf Queues does not start
Prabhu Joseph created YARN-10183: Summary: Auto Created Leaf Queues does not start Key: YARN-10183 URL: https://issues.apache.org/jira/browse/YARN-10183 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Auto Created Leaf Queues does not start. Have a Parent queue with auto create leaf queue enabled. {code} yarn.scheduler.capacity.root.batch.auto-create-child-queue.enabled true yarn.scheduler.capacity.root.batch.leaf-queue-template.capacity 30 {code} Stopping Parent Queue stops auto created leaf queues. But Starting Parent Queue / Auto Created Leaf Queue does not start the Leaf Queue causing subsequent jobs submitted failing. {code} yarn.scheduler.capacity.root.batch.state RUNNING yarn.scheduler.capacity.root.batch.hive.state RUNNING {code} Subsequent job fails {code} Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1583503947651_0002 to YARN : org.apache.hadoop.security.AccessControlException: Queue root.batch.hive is STOPPED. Cannot accept submission of application: application_1583503947651_0002 at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:327) at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:303) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:330) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10168: Attachment: YARN-10168-003.patch > FS-CS Converter: tool doesn't handle min/max resource conversion correctly > -- > > Key: YARN-10168 > URL: https://issues.apache.org/jira/browse/YARN-10168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Peter Bacsko >Priority: Blocker > Labels: fs2cs > Attachments: YARN-10168-001.patch, YARN-10168-002.patch, > YARN-10168-003.patch > > > Trying to understand logics of convert min and max resource from FS to CS, > and found some issues: > 1) > In FSQueueConverter#emitMaximumCapacity > Existing logic in FS is to either specify a maximum percentage for queues > against cluster resources. Or, specify an absolute valued maximum resource. > In the existing FS2CS converter, when a percentage-based maximum resource is > specified, the converter takes a global resource from fs2cs CLI, and applies > percentages to that. It is not correct since the percentage-based value will > get lost, and in the future when cluster resources go up and down, the > maximum resource cannot be changed. > 2) > The logic to deal with min/weight resource is also questionable: > The existing fs2cs tool, it takes precedence of percentage over > absoluteResource, and could set both to a queue config. See > FSQueueConverter.Capacity#toString > However, in CS, comparing to FS, the weights/min resource is quite different: > CS use the same queue.capacity to specify both percentage-based or > absolute-resource-based configs (Similar to how FS deal with maximum > Resource). > The capacity defines guaranteed resource, which also impact fairshare of the > queue. (The more guaranteed resource a queue has, the larger "pie" the queue > can get if there's any additional available resource). > In FS, minResource defined the guaranteed resource, and weight defined how > much the pie can grow to. > So to me, in FS, we should pick and choose either weight or minResource to > generate CS. > 3) > In FS, mix-use of absolute-resource configs (like min/maxResource), and > percentage-based (like weight) is allowed. But in CS, it is not allowed. The > reason is discussed on YARN-5881, and find [a]Should we support specifying a > mix of percentage ... > The existing fs2cs doesn't handle the issue, which could set mixed absolute > resource and percentage-based resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053464#comment-17053464 ] Hadoop QA commented on YARN-10168: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 40s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10168 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995847/YARN-10168-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 01ba02ad4197 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053446#comment-17053446 ] Hadoop QA commented on YARN-10168: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10168 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995843/YARN-10168-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux cb87b6cdefb7 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25650/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | |
[jira] [Commented] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
[ https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053422#comment-17053422 ] Adam Antal commented on YARN-9419: -- [~gandras], thanks for the fixed items. I agree on what you've said, let's keep this patch limited to {{GpuResourcePlugin}}. One last thing: in L78 could you please use {code:java} if (executorClass.equals(DefaultContainerExecutor.class)) { ... {code} instead of {code:java} if (executorClass == DefaultContainerExecutor.class) { ... {code} > Log a warning if GPU isolation is enabled but LinuxContainerExecutor is > disabled > > > Key: YARN-9419 > URL: https://issues.apache.org/jira/browse/YARN-9419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-9419.001.patch, YARN-9419.002.patch > > > A WARN log should be added at least (logged once on startup) that notifies > the user about a potentially offending configuration: GPU isolation is > enabled but LCE is disabled. > I think this is a dangerous, yet valid configuration: As LCE is the only > container executor that utilizes cgroups, no real HW-isolation happens if LCE > is disabled. > Let's suppose we have 2 GPU devices in 1 node: > # NM reports 2 devices (as a Resource) to RM > # RM assigns GPU#1 to container#2 that requests 1 GPU device > # When container#2 is also requesting 1 GPU device, RM is going to assign > either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. > If GPU#1 is assigned to a second container, nasty things could happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9999) TestFSSchedulerConfigurationStore: Extend from ConfigurationStoreBaseTest, general code cleanup
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053388#comment-17053388 ] Hadoop QA commented on YARN-: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 1 unchanged - 4 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}106m 7s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN- | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995763/YARN-.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 953b609a241f 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25648/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25648/testReport/ | | Max. process+thread count | 839 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-10002) Code cleanup and improvements ConfigurationStoreBaseTest
[ https://issues.apache.org/jira/browse/YARN-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053356#comment-17053356 ] Hadoop QA commented on YARN-10002: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 7m 23s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 1 unchanged - 4 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 39s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 54s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-10002 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995751/YARN-10002.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a0162e089724 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/25649/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25649/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053345#comment-17053345 ] Peter Bacsko commented on YARN-9879: [~prabhujoseph] thanks, I think it's likely that this piece of code is missing that I mentioned here: https://issues.apache.org/jira/browse/YARN-10108?focusedCommentId=17025143=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17025143 > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10168: Attachment: YARN-10168-002.patch > FS-CS Converter: tool doesn't handle min/max resource conversion correctly > -- > > Key: YARN-10168 > URL: https://issues.apache.org/jira/browse/YARN-10168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Peter Bacsko >Priority: Blocker > Labels: fs2cs > Attachments: YARN-10168-001.patch, YARN-10168-002.patch > > > Trying to understand logics of convert min and max resource from FS to CS, > and found some issues: > 1) > In FSQueueConverter#emitMaximumCapacity > Existing logic in FS is to either specify a maximum percentage for queues > against cluster resources. Or, specify an absolute valued maximum resource. > In the existing FS2CS converter, when a percentage-based maximum resource is > specified, the converter takes a global resource from fs2cs CLI, and applies > percentages to that. It is not correct since the percentage-based value will > get lost, and in the future when cluster resources go up and down, the > maximum resource cannot be changed. > 2) > The logic to deal with min/weight resource is also questionable: > The existing fs2cs tool, it takes precedence of percentage over > absoluteResource, and could set both to a queue config. See > FSQueueConverter.Capacity#toString > However, in CS, comparing to FS, the weights/min resource is quite different: > CS use the same queue.capacity to specify both percentage-based or > absolute-resource-based configs (Similar to how FS deal with maximum > Resource). > The capacity defines guaranteed resource, which also impact fairshare of the > queue. (The more guaranteed resource a queue has, the larger "pie" the queue > can get if there's any additional available resource). > In FS, minResource defined the guaranteed resource, and weight defined how > much the pie can grow to. > So to me, in FS, we should pick and choose either weight or minResource to > generate CS. > 3) > In FS, mix-use of absolute-resource configs (like min/maxResource), and > percentage-based (like weight) is allowed. But in CS, it is not allowed. The > reason is discussed on YARN-5881, and find [a]Should we support specifying a > mix of percentage ... > The existing fs2cs doesn't handle the issue, which could set mixed absolute > resource and percentage-based resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053337#comment-17053337 ] Prabhu Joseph commented on YARN-9879: - [~pbacsko] Have tested with full queue name in mapping and which failed with different error. {code} yarn.scheduler.capacity.queue-mappings u:%user:root.batch.%user Caused by: java.io.IOException: mapping contains invalid or non-leaf queue [%user] and invalid parent queue which does not match existing leaf queue's parent : [root.batch] does not match [ batch] at org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:64) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:298) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:674) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:709) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:750) {code} The existing way it worked is by just setting parent queue name like below. yarn.scheduler.capacity.queue-mappings u:%user:batch.%user *Reference:* https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html {code} user-group queue mapping(s) listed in yarn.scheduler.capacity.queue-mappings need to specify an additional parent queue parameter to identify which parent queue the auto-created leaf queues need to be created under. Refer above Queue Mapping based on User or Group section for more details. Please note that such parent queues also need to enable auto-creation of child queues as mentioned in Parent queue configuration for dynamic leaf queue creation and management section below Example: yarn.scheduler.capacity.queue-mappings u:user1:queue1,g:group1:queue2,u:user2:%primary_group,u:%user:parent1.%user Here, u:%user:parent1.%user mapping allows any other than user1, user2 to be mapped to its own user specific leaf queue which will be auto-created under . {code} > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10110) In Federation Secure cluster Application submission fails when authorization is enabled
[ https://issues.apache.org/jira/browse/YARN-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053331#comment-17053331 ] Prabhu Joseph commented on YARN-10110: -- [~brahmareddy] Having different policy provider for Router related services is fine. The patch looks good except below 1. RouterClientRMService and RouterRMAdminService is not honoring the acls defined in hadoop-policy.xml. Other Services like ClientRMService reads explicitly from hadoop-policy.xml. {code} if (conf.getBoolean( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, false)) { InputStream inputStream = this.rmContext.getConfigurationProvider() .getConfigurationInputStream(conf, YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE); if (inputStream != null) { conf.addResource(inputStream); } refreshServiceAcls(conf, RMPolicyProvider.getInstance()); } {code} > In Federation Secure cluster Application submission fails when authorization > is enabled > --- > > Key: YARN-10110 > URL: https://issues.apache.org/jira/browse/YARN-10110 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Blocker > Attachments: YARN-10110.001.patch, YARN-10110.002.patch, > YARN-10110.003.patch, YARN-10110.004.patch > > > 【Precondition】: > 1. Secure Federated cluster is available > 2. Add the below configuration in Router and client core-site.xml > hadoop.security.authorization= true > 3. Restart the router service > 【Test step】: > 1. Go to router client bin path and submit a MR PI job > 2. Observe the client console screen > 【Expect Output】: > No error should be thrown and Job should be successful > 【Actual Output】: > Job failed prompting "Protocol interface > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is not known.," > 【Additional Note】: > But on setting the parameter as false, job is submitted and success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053319#comment-17053319 ] Peter Bacsko commented on YARN-9879: [~prabhujoseph] could you try it with this mapping rule? {{u:%user:root.batch.%user}} That is, you give the full path, not just the leaf queue name. Although I believe {{QueueManager.get()}} should be able to retrieve both. > Allow multiple leaf queues with the same name in CS > --- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10168: Labels: fs2cs (was: ) > FS-CS Converter: tool doesn't handle min/max resource conversion correctly > -- > > Key: YARN-10168 > URL: https://issues.apache.org/jira/browse/YARN-10168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Peter Bacsko >Priority: Blocker > Labels: fs2cs > Attachments: YARN-10168-001.patch > > > Trying to understand logics of convert min and max resource from FS to CS, > and found some issues: > 1) > In FSQueueConverter#emitMaximumCapacity > Existing logic in FS is to either specify a maximum percentage for queues > against cluster resources. Or, specify an absolute valued maximum resource. > In the existing FS2CS converter, when a percentage-based maximum resource is > specified, the converter takes a global resource from fs2cs CLI, and applies > percentages to that. It is not correct since the percentage-based value will > get lost, and in the future when cluster resources go up and down, the > maximum resource cannot be changed. > 2) > The logic to deal with min/weight resource is also questionable: > The existing fs2cs tool, it takes precedence of percentage over > absoluteResource, and could set both to a queue config. See > FSQueueConverter.Capacity#toString > However, in CS, comparing to FS, the weights/min resource is quite different: > CS use the same queue.capacity to specify both percentage-based or > absolute-resource-based configs (Similar to how FS deal with maximum > Resource). > The capacity defines guaranteed resource, which also impact fairshare of the > queue. (The more guaranteed resource a queue has, the larger "pie" the queue > can get if there's any additional available resource). > In FS, minResource defined the guaranteed resource, and weight defined how > much the pie can grow to. > So to me, in FS, we should pick and choose either weight or minResource to > generate CS. > 3) > In FS, mix-use of absolute-resource configs (like min/maxResource), and > percentage-based (like weight) is allowed. But in CS, it is not allowed. The > reason is discussed on YARN-5881, and find [a]Should we support specifying a > mix of percentage ... > The existing fs2cs doesn't handle the issue, which could set mixed absolute > resource and percentage-based resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly
[ https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10168: Attachment: YARN-10168-001.patch > FS-CS Converter: tool doesn't handle min/max resource conversion correctly > -- > > Key: YARN-10168 > URL: https://issues.apache.org/jira/browse/YARN-10168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Peter Bacsko >Priority: Blocker > Attachments: YARN-10168-001.patch > > > Trying to understand logics of convert min and max resource from FS to CS, > and found some issues: > 1) > In FSQueueConverter#emitMaximumCapacity > Existing logic in FS is to either specify a maximum percentage for queues > against cluster resources. Or, specify an absolute valued maximum resource. > In the existing FS2CS converter, when a percentage-based maximum resource is > specified, the converter takes a global resource from fs2cs CLI, and applies > percentages to that. It is not correct since the percentage-based value will > get lost, and in the future when cluster resources go up and down, the > maximum resource cannot be changed. > 2) > The logic to deal with min/weight resource is also questionable: > The existing fs2cs tool, it takes precedence of percentage over > absoluteResource, and could set both to a queue config. See > FSQueueConverter.Capacity#toString > However, in CS, comparing to FS, the weights/min resource is quite different: > CS use the same queue.capacity to specify both percentage-based or > absolute-resource-based configs (Similar to how FS deal with maximum > Resource). > The capacity defines guaranteed resource, which also impact fairshare of the > queue. (The more guaranteed resource a queue has, the larger "pie" the queue > can get if there's any additional available resource). > In FS, minResource defined the guaranteed resource, and weight defined how > much the pie can grow to. > So to me, in FS, we should pick and choose either weight or minResource to > generate CS. > 3) > In FS, mix-use of absolute-resource configs (like min/maxResource), and > percentage-based (like weight) is allowed. But in CS, it is not allowed. The > reason is discussed on YARN-5881, and find [a]Should we support specifying a > mix of percentage ... > The existing fs2cs doesn't handle the issue, which could set mixed absolute > resource and percentage-based resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9997) Code cleanup in ZKConfigurationStore
[ https://issues.apache.org/jira/browse/YARN-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053310#comment-17053310 ] Hadoop QA commented on YARN-9997: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 90m 3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9997 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995822/YARN-9997.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d96efc4eee50 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25647/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25647/testReport/ | | Max. process+thread count | 819 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS
[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053309#comment-17053309 ] Prabhu Joseph commented on YARN-9879: - Thanks [~shuzirra] for the patch. Have tested below scenarios with the patch and it works fine except two issues. 1. Job Submission with leaf queuename and full queue path. 2. Queue Placement 3. Auto Creation of Leaf Queue. 4. RM UI 5. RMWebService Scheduler response. 6. RMAdminService RefreshQueues 7. Scheduler Configuration Mutation API - add / remove / update queue. 8. Recovery 9. RM JMX Metrics - YARN-9772 *Issue 1: RM fails to start when a dynamic parent queue "batch" (auto-create-child-queue.enabled=true) and another leaf queue "batch" exists.* CS Config: root.batch -> (auto-create-child-queue.enabled=true) root.default root.A.batch yarn.scheduler.capacity.queue-mappings = u:%user:batch.%user* {code:java} 2020-03-06 00:54:59,239 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:876) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1288) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1576) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:342) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 7 more Caused by: java.io.IOException: mapping contains invalid or non-leaf queue [%user] and invalid parent queue [batch] at org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:50) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:298) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:674) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:709) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:750) {code} *Complete CS Config to repro above issue:* {code:java} http://www.w3.org/2001/XInclude;> yarn.scheduler.capacity.root.batch.leaf-queue-template.capacity 40 yarn.scheduler.capacity.queue-mappings u:%user:batch.%user yarn.scheduler.capacity.root.batch.auto-create-child-queue.enabled true yarn.scheduler.capacity.root.queues default,batch,A yarn.scheduler.capacity.queue-mappings-override.enable false yarn.scheduler.capacity.root.capacity 100 yarn.scheduler.capacity.root.default.capacity 40 yarn.scheduler.capacity.root.batch.capacity 40 yarn.scheduler.capacity.root.A.capacity 20 yarn.scheduler.capacity.root.A.queues batch yarn.scheduler.capacity.root.A.batch.capacity 100 {code} *Issue 2:* *RM Starts fine with below queue config but when submitting job with queuename "A" it fails. The job submission works fine when specifying the full queue name root.B.A. There is only one leaf queue with queuename "A" and the placement has to find that right?* root.A.B root.B.A {code:java} yarn jar /HADOOP/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0-SNAPSHOT-tests.jar sleep -Dmapreduce.job.queuename=A -m 1 -mt 1 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
[jira] [Commented] (YARN-10110) In Federation Secure cluster Application submission fails when authorization is enabled
[ https://issues.apache.org/jira/browse/YARN-10110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053210#comment-17053210 ] Brahma Reddy Battula commented on YARN-10110: - {quote}2. ClientRMService, ApplicationMasterService , ResourceTrackerService and AdminService all reuses the RMPolicyProvider. I think RouterClientRMService can also use the same RMPolicyProvider. {quote} I think, all the service which are in RMPolicyProvider might not require be handled by router. Having different policy provider should be fine I feel. [~BilwaST] Latest approach looks good to me. [~prabhujoseph] Please let me know your though on this. > In Federation Secure cluster Application submission fails when authorization > is enabled > --- > > Key: YARN-10110 > URL: https://issues.apache.org/jira/browse/YARN-10110 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Blocker > Attachments: YARN-10110.001.patch, YARN-10110.002.patch, > YARN-10110.003.patch, YARN-10110.004.patch > > > 【Precondition】: > 1. Secure Federated cluster is available > 2. Add the below configuration in Router and client core-site.xml > hadoop.security.authorization= true > 3. Restart the router service > 【Test step】: > 1. Go to router client bin path and submit a MR PI job > 2. Observe the client console screen > 【Expect Output】: > No error should be thrown and Job should be successful > 【Actual Output】: > Job failed prompting "Protocol interface > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is not known.," > 【Additional Note】: > But on setting the parameter as false, job is submitted and success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
[ https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053206#comment-17053206 ] Hadoop QA commented on YARN-9419: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 42s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.7 Server=19.03.7 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | YARN-9419 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12995820/YARN-9419.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f8a47643248c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 004e955 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25646/testReport/ | | Max. process+thread count | 308 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25646/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (YARN-10003) YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore
[ https://issues.apache.org/jira/browse/YARN-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053197#comment-17053197 ] Szilard Nemeth commented on YARN-10003: --- Hi [~bteke], Thanks for working on this patch. Some comments: 1. Regarding the changes in YarnConfigurationStore#checkVersion: If I'm understanding the code correctly, it worked before like this: getConfStoreVersion() was invoked to store the loaded version. If it was null, getCurrentVersion() was invoked to get the version again. If this was compatible with currentVersion, storeVersion() was called. Now your code is different. I'm a bit concerned about this piece of code: {code} // when currentVersion is null, the checkVersion method is overridden if (currentVersion.equals(loadedVersion)) { return; } {code} There's no guard for nullity of currentVersion. If it becomes null in any circumstance, an NPE would be thrown. Can you please guard this case? 2. In the same method and in relation with point 1., I don't really get what you wanted to say with this comment: {code} // when currentVersion is null, the checkVersion method is overridden {code} Could you clarify this a bit, please? 3. In TestLeveldbConfigurationStore#testIncompatibleVersion: Can you please add a message to the assertEquals? Same applies to TestZKConfigurationStore#testIncompatibleVersion? 4. Nit: In TestInMemoryConfigurationStore#checkVersion: The message should be "checkVersion threw an exception" or "...has thrown an exception". > YarnConfigurationStore#checkVersion throws exception that belongs to > RMStateStore > - > > Key: YARN-10003 > URL: https://issues.apache.org/jira/browse/YARN-10003 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Attachments: YARN-10003.001.patch, YARN-10003.002.patch, > YARN-10003.003.patch, YARN-10003.004.patch > > > RMStateVersionIncompatibleException is thrown from method "checkVersion". > Moreover, there's a TODO here saying this method is copied from RMStateStore. > We should revise this method a bit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
[ https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053129#comment-17053129 ] Andras Gyori commented on YARN-9419: I have taken the class-based equality comparison approach, because the other solution would lead to the alteration of different parts of the codebase, which might not belong to this issue in its entirety. > Log a warning if GPU isolation is enabled but LinuxContainerExecutor is > disabled > > > Key: YARN-9419 > URL: https://issues.apache.org/jira/browse/YARN-9419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-9419.001.patch, YARN-9419.002.patch > > > A WARN log should be added at least (logged once on startup) that notifies > the user about a potentially offending configuration: GPU isolation is > enabled but LCE is disabled. > I think this is a dangerous, yet valid configuration: As LCE is the only > container executor that utilizes cgroups, no real HW-isolation happens if LCE > is disabled. > Let's suppose we have 2 GPU devices in 1 node: > # NM reports 2 devices (as a Resource) to RM > # RM assigns GPU#1 to container#2 that requests 1 GPU device > # When container#2 is also requesting 1 GPU device, RM is going to assign > either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. > If GPU#1 is assigned to a second container, nasty things could happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9419) Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled
[ https://issues.apache.org/jira/browse/YARN-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-9419: --- Attachment: YARN-9419.002.patch > Log a warning if GPU isolation is enabled but LinuxContainerExecutor is > disabled > > > Key: YARN-9419 > URL: https://issues.apache.org/jira/browse/YARN-9419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-9419.001.patch, YARN-9419.002.patch > > > A WARN log should be added at least (logged once on startup) that notifies > the user about a potentially offending configuration: GPU isolation is > enabled but LCE is disabled. > I think this is a dangerous, yet valid configuration: As LCE is the only > container executor that utilizes cgroups, no real HW-isolation happens if LCE > is disabled. > Let's suppose we have 2 GPU devices in 1 node: > # NM reports 2 devices (as a Resource) to RM > # RM assigns GPU#1 to container#2 that requests 1 GPU device > # When container#2 is also requesting 1 GPU device, RM is going to assign > either GPU#1 or GPU#2, so there's no guarantee that GPU#2 will be assigned. > If GPU#1 is assigned to a second container, nasty things could happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org