[jira] [Commented] (YARN-9830) Improve ContainerAllocationExpirer it blocks scheduling
[ https://issues.apache.org/jira/browse/YARN-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949904#comment-16949904 ] Bibin Chundatt commented on YARN-9830: -- [~sunil.gov...@gmail.com] Could you take a look > Improve ContainerAllocationExpirer it blocks scheduling > --- > > Key: YARN-9830 > URL: https://issues.apache.org/jira/browse/YARN-9830 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin Chundatt >Assignee: Bibin Chundatt >Priority: Critical > Labels: perfomance > Attachments: YARN-9830.001.patch > > > {quote} >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.util.AbstractLivelinessMonitor.register(AbstractLivelinessMonitor.java:106) > - waiting to lock <0x7fa348749550> (a > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:601) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:592) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > - locked <0x7fc8852f8200> (a > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:474) > at > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65) > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890 ] liusheng edited comment on YARN-9511 at 10/12/19 3:51 AM: -- Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of *TestAuxServices*, and will cause *2 Errors 9 Failures*, see: {code:java} Failures: TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0> TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but was:<0> TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testCustomizedAuxServiceClassPath:416 TestAuxServices.testManualReload:919 expected:<2> but was:<0> TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar is wrong.Should throw out exception. TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0> TestAuxServices.testValidAuxServiceName:698 Should receive the exception. Errors: TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote jar... {code} After debuging, I found all these issues are directly or indirectly related with files permissions. there are two situations when running these tests: # *useManifest* enabled, when running tests with useManifest enabled, the tests will check and use the manifest file: {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code} this file and all its parents directories must not be writable by group or others, see: {code:java} private boolean checkManifestPermissions(FileStatus status) throws IOException { if ((status.getPermission().toShort() & 0022) != 0) { LOG.error("Manifest file and parents must not be writable by group or " + "others. The current Permission of " + status.getPath() + " is " + status.getPermission()); return false; } Path parent = status.getPath().getParent(); if (parent == null) { return true; } return checkManifestPermissions(manifestFS.getFileStatus(parent)); }{code} # *useManifest not* enabled, when running tests with useManifest enabled, tests will use a *test-runjar.jar* file {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar {code} related code for checking its permission: {code:java} private Path maybeDownloadJars(String sName, String className, String remoteFile, AuxServiceFile.TypeEnum type, Configuration conf) throws IOException { // load AuxiliaryService from remote classpath FileContext localLFS = getLocalFileContext(conf); // create NM aux-service dir in NM localdir if it does not exist. Path nmAuxDir = dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR + NM_AUX_SERVICE_DIR); if (!localLFS.util().exists(nmAuxDir)) { try { localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true); } catch (IOException ex) { throw new YarnRuntimeException("Fail to create dir:" + nmAuxDir.toString(), ex); } } Path src = new Path(remoteFile); FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf); FileStatus scFileStatus = remoteLFS.getFileStatus(src); if (!scFileStatus.getOwner().equals( this.userUGI.getShortUserName())) { throw new YarnRuntimeException("The remote jarfile owner:" + scFileStatus.getOwner() + " is not the same as the NM user:" + this.userUGI.getShortUserName() + "."); } if ((scFileStatus.getPermission().toShort() & 0022) != 0) { throw new YarnRuntimeException("The remote jarfile should not " + "be writable by group or others. " + "The current Permission is " + scFileStatus.getPermission().toShort()); } {code} According to the above reasons, I have tried to change *manifest.txt* file parents directories without *writeable* permission of group and others. and change the *umask to 022*, which will effect new created file and directories permissions, because the *manifest.txt* and *run-tests.jar* will be new created when running tests. {code:java} chmod go-w yourpath/hadoop/ -R umask 022 umask {code} After doing above and re-run tests of *TestAuxServices*, all the tests can pass. Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of Hadoop or not. could you please give some suggestions? Thanks. was (Author: seanlau): Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of
[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890 ] liusheng edited comment on YARN-9511 at 10/12/19 3:51 AM: -- Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of *TestAuxServices*, and will cause *2 Errors 9 Failures*, see: {code:java} Failures: TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0> TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but was:<0> TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testCustomizedAuxServiceClassPath:416 TestAuxServices.testManualReload:919 expected:<2> but was:<0> TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar is wrong.Should throw out exception. TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0> TestAuxServices.testValidAuxServiceName:698 Should receive the exception. Errors: TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote jar... {code} After debuging, I found all these issues are directly or indirectly related with files permissions. there are two situations when running these tests: # *useManifest* enabled, when running tests with useManifest enabled, the tests will check and use the manifest file: {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code} this file and all its parents directories must not be writable by group or others, see: {code:java} private boolean checkManifestPermissions(FileStatus status) throws IOException { if ((status.getPermission().toShort() & 0022) != 0) { LOG.error("Manifest file and parents must not be writable by group or " + "others. The current Permission of " + status.getPath() + " is " + status.getPermission()); return false; } Path parent = status.getPath().getParent(); if (parent == null) { return true; } return checkManifestPermissions(manifestFS.getFileStatus(parent)); }{code} # *useManifest not* enabled, when running tests with useManifest enabled, tests will use a *test-runjar.jar* file {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar {code} related code for checking its permission: {code:java} private Path maybeDownloadJars(String sName, String className, String remoteFile, AuxServiceFile.TypeEnum type, Configuration conf) throws IOException { // load AuxiliaryService from remote classpath FileContext localLFS = getLocalFileContext(conf); // create NM aux-service dir in NM localdir if it does not exist. Path nmAuxDir = dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR + NM_AUX_SERVICE_DIR); if (!localLFS.util().exists(nmAuxDir)) { try { localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true); } catch (IOException ex) { throw new YarnRuntimeException("Fail to create dir:" + nmAuxDir.toString(), ex); } } Path src = new Path(remoteFile); FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf); FileStatus scFileStatus = remoteLFS.getFileStatus(src); if (!scFileStatus.getOwner().equals( this.userUGI.getShortUserName())) { throw new YarnRuntimeException("The remote jarfile owner:" + scFileStatus.getOwner() + " is not the same as the NM user:" + this.userUGI.getShortUserName() + "."); } if ((scFileStatus.getPermission().toShort() & 0022) != 0) { throw new YarnRuntimeException("The remote jarfile should not " + "be writable by group or others. " + "The current Permission is " + scFileStatus.getPermission().toShort()); } {code} According to the above reasons, I have tried to change *manifest.txt* file parents directories without *writeable* permission of group and others. and change the *umask to 022*, which will effect new created file and directories permissions, because the *manifest.txt* and *run-tests.jar* will be new created when running tests. {code:java} chmod go-w yourpath/hadoop/ -R umask 022 umask {code} After doing above and re-run tests of *TestAuxServices*, all the tests can pass. Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of Hadoop or not. could you please give some suggestions? Thanks. was (Author: seanlau): Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of
[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890 ] liusheng edited comment on YARN-9511 at 10/12/19 3:50 AM: -- Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of *TestAuxServices*, and will cause *2 Errors 9 Failures*, see: {code:java} Failures: TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0> TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but was:<0> TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testCustomizedAuxServiceClassPath:416 TestAuxServices.testManualReload:919 expected:<2> but was:<0> TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar is wrong.Should throw out exception. TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0> TestAuxServices.testValidAuxServiceName:698 Should receive the exception. Errors: TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote jar... {code} After debuging, I found all these issues are directly or indirectly related with files permissions. there are two situations when running these tests: # *useManifest* enabled, when running tests with useManifest enabled, the tests will check and use the manifest file: {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code} this file and all its parents directories must not be writable by group or others, see: {code:java} private boolean checkManifestPermissions(FileStatus status) throws IOException { if ((status.getPermission().toShort() & 0022) != 0) { LOG.error("Manifest file and parents must not be writable by group or " + "others. The current Permission of " + status.getPath() + " is " + status.getPermission()); return false; } Path parent = status.getPath().getParent(); if (parent == null) { return true; } return checkManifestPermissions(manifestFS.getFileStatus(parent)); }{code} ** # *useManifest not* enabled, when running tests with useManifest enabled, tests will use a *test-runjar.jar* file {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar {code} related code for checking its permission: {code:java} private Path maybeDownloadJars(String sName, String className, String remoteFile, AuxServiceFile.TypeEnum type, Configuration conf) throws IOException { // load AuxiliaryService from remote classpath FileContext localLFS = getLocalFileContext(conf); // create NM aux-service dir in NM localdir if it does not exist. Path nmAuxDir = dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR + NM_AUX_SERVICE_DIR); if (!localLFS.util().exists(nmAuxDir)) { try { localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true); } catch (IOException ex) { throw new YarnRuntimeException("Fail to create dir:" + nmAuxDir.toString(), ex); } } Path src = new Path(remoteFile); FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf); FileStatus scFileStatus = remoteLFS.getFileStatus(src); if (!scFileStatus.getOwner().equals( this.userUGI.getShortUserName())) { throw new YarnRuntimeException("The remote jarfile owner:" + scFileStatus.getOwner() + " is not the same as the NM user:" + this.userUGI.getShortUserName() + "."); } if ((scFileStatus.getPermission().toShort() & 0022) != 0) { throw new YarnRuntimeException("The remote jarfile should not " + "be writable by group or others. " + "The current Permission is " + scFileStatus.getPermission().toShort()); } {code} According to the above reasons, I have tried to change *manifest.txt* file parents directories without *writeable* permission of group and others. and change the *umask to 022*, which will effect new created file and directories permissions, because the *manifest.txt* and *run-tests.jar* will be new created when running tests. {code:java} chmod go-w yourpath/hadoop/ -R umask 022 umask {code} After doing above and re-run tests of *TestAuxServices*, all the tests can pass. Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of Hadoop or not. could you please give some suggestions? Thanks. was (Author: seanlau): Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of
[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890 ] liusheng edited comment on YARN-9511 at 10/12/19 3:49 AM: -- Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of *TestAuxServices*, and will cause *2 Errors 9 Failures*, see: {code:java} Failures: TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0> TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but was:<0> TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testCustomizedAuxServiceClassPath:416 TestAuxServices.testManualReload:919 expected:<2> but was:<0> TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar is wrong.Should throw out exception. TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0> TestAuxServices.testValidAuxServiceName:698 Should receive the exception. Errors: TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote jar... {code} After debuging, I found all these issues are directly or indirectly related with files permissions. there are two situations when running these tests: # *useManifest* enabled, when running tests with useManifest enabled, the tests will check and use the manifest file: {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code} this file and all its parents directories must not be writable by group or others, see: {code:java} private boolean checkManifestPermissions(FileStatus status) throws IOException { if ((status.getPermission().toShort() & 0022) != 0) { LOG.error("Manifest file and parents must not be writable by group or " + "others. The current Permission of " + status.getPath() + " is " + status.getPermission()); return false; } Path parent = status.getPath().getParent(); if (parent == null) { return true; } return checkManifestPermissions(manifestFS.getFileStatus(parent)); }{code} ** # *useManifest not* enabled, when running tests with useManifest enabled, tests will use a *test-runjar.jar* file {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar {code} related code for checking its permission: {code:java} private Path maybeDownloadJars(String sName, String className, String remoteFile, AuxServiceFile.TypeEnum type, Configuration conf) throws IOException { // load AuxiliaryService from remote classpath FileContext localLFS = getLocalFileContext(conf); // create NM aux-service dir in NM localdir if it does not exist. Path nmAuxDir = dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR + NM_AUX_SERVICE_DIR); if (!localLFS.util().exists(nmAuxDir)) { try { localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true); } catch (IOException ex) { throw new YarnRuntimeException("Fail to create dir:" + nmAuxDir.toString(), ex); } } Path src = new Path(remoteFile); FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf); FileStatus scFileStatus = remoteLFS.getFileStatus(src); if (!scFileStatus.getOwner().equals( this.userUGI.getShortUserName())) { throw new YarnRuntimeException("The remote jarfile owner:" + scFileStatus.getOwner() + " is not the same as the NM user:" + this.userUGI.getShortUserName() + "."); } if ((scFileStatus.getPermission().toShort() & 0022) != 0) { throw new YarnRuntimeException("The remote jarfile should not " + "be writable by group or others. " + "The current Permission is " + scFileStatus.getPermission().toShort()); } {code} According to the above reasons, I have tried to change *manifest.txt* file parents directories without *writeable* permission of group and others. and change the *umask to 077*, which will effect new created file and directories permissions, because the *manifest.txt* and *run-tests.jar* will be new created when running tests. {code:java} chmod go-w yourpath/hadoop/ -R umask 022 umask {code} After doing above and re-run tests of *TestAuxServices*, all the tests can pass. Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of Hadoop or not. could you please give some suggestions? Thanks. was (Author: seanlau): Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of
[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890 ] liusheng commented on YARN-9511: Hi [~snemeth] [~adam.antal], Thank you both for care about this issue. I have take some time tried to find the reason of this issue. This issue will effect the tests of *TestAuxServices*, and will cause *2 Errors 9 Failures*, see: {code:java} Failures: TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0> TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but was:<0> TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but was:<1> TestAuxServices.testCustomizedAuxServiceClassPath:416 TestAuxServices.testManualReload:919 expected:<2> but was:<0> TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar is wrong.Should throw out exception. TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0> TestAuxServices.testValidAuxServiceName:698 Should receive the exception. Errors: TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote jar... {code} After debuging, I found all these issues are directly or indirectly related with files permissions. there are two situations when running these tests: # *useManifest* enabled, when running tests with useManifest enabled, the tests will check and use the manifest file: {code:java} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code} this file and all its parents directories must not be writable by group or others, see: {code:java} private boolean checkManifestPermissions(FileStatus status) throws IOException { if ((status.getPermission().toShort() & 0022) != 0) { LOG.error("Manifest file and parents must not be writable by group or " + "others. The current Permission of " + status.getPath() + " is " + status.getPermission()); return false; } Path parent = status.getPath().getParent(); if (parent == null) { return true; } return checkManifestPermissions(manifestFS.getFileStatus(parent)); }{code} # *useManifest not* enabled, when running tests with useManifest enabled, tests will use a test-jar.jar file {code:java} private Path maybeDownloadJars(String sName, String className, String remoteFile, AuxServiceFile.TypeEnum type, Configuration conf) throws IOException { // load AuxiliaryService from remote classpath FileContext localLFS = getLocalFileContext(conf); // create NM aux-service dir in NM localdir if it does not exist. Path nmAuxDir = dirsHandler.getLocalPathForWrite("." + Path.SEPARATOR + NM_AUX_SERVICE_DIR); if (!localLFS.util().exists(nmAuxDir)) { try { localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true); } catch (IOException ex) { throw new YarnRuntimeException("Fail to create dir:" + nmAuxDir.toString(), ex); } } Path src = new Path(remoteFile); FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf); FileStatus scFileStatus = remoteLFS.getFileStatus(src); if (!scFileStatus.getOwner().equals( this.userUGI.getShortUserName())) { throw new YarnRuntimeException("The remote jarfile owner:" + scFileStatus.getOwner() + " is not the same as the NM user:" + this.userUGI.getShortUserName() + "."); } if ((scFileStatus.getPermission().toShort() & 0022) != 0) { throw new YarnRuntimeException("The remote jarfile should not " + "be writable by group or others. " + "The current Permission is " + scFileStatus.getPermission().toShort()); } {code} According to the above reasons, I have tried to change *manifest.txt* file parents directories without *writeable* permission of group and others. and change the *umask to 077*, which will effect new created file and directories permissions, because the *manifest.txt* and *run-tests.jar* will be new created when running tests. {code:java} chmod go-w yourpath/hadoop/ -R umask 022 umask {code} After doing above and re-run tests of *TestAuxServices*, all the tests can pass. Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of Hadoop or not. could you please give some suggestions? Thanks. > [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: > The remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL:
[jira] [Commented] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949842#comment-16949842 ] Hadoop QA commented on YARN-9894: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 12 new + 6 unchanged - 0 fixed = 18 total (was 6) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9894 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982821/YARN-9894.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f3d44dadf48e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c561a70 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24967/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Commented] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949822#comment-16949822 ] Hadoop QA commented on YARN-9884: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 36m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9884 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982818/YARN-9884.003.patch | | Optional Tests | dupname asflicense compile cc mvnsite javac unit | | uname | Linux 9bcf0ce498ac 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c561a70 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24968/testReport/ | | Max. process+thread count | 307 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24968/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch, > YARN-9884.003.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-9894: - Attachment: YARN-9894.001.patch > CapacitySchedulerPerf test for measuring hundreds of apps in a large number > of queues. > -- > > Key: YARN-9894 > URL: https://issues.apache.org/jira/browse/YARN-9894 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, test >Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-9894.001.patch > > > I have developed a unit test based on the existing TestCapacitySchedulerPerf > tests that will measure the performance of a configurable number of apps in a > configurable number of queues. It will also test the performance of a cluster > that has many queues but only a portion of them are active. > {code:title=For example:} > $ mvn test > -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ > -DRunCapacitySchedulerPerfTests=true > -DNumberOfQueues=100 \ > -DNumberOfApplications=200 \ > -DPercentActiveQueues=100 > {code} > - Parameters: > -- RunCapacitySchedulerPerfTests=true: > Needed in order to trigger the test > -- NumberOfQueues > Configurable number of queues > -- NumberOfApplications > Total number of apps to run in the whole cluster, distributed evenly across > all queues > -- PercentActiveQueues > Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-9894: - Attachment: YARN-9894.001.patch > CapacitySchedulerPerf test for measuring hundreds of apps in a large number > of queues. > -- > > Key: YARN-9894 > URL: https://issues.apache.org/jira/browse/YARN-9894 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, test >Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > I have developed a unit test based on the existing TestCapacitySchedulerPerf > tests that will measure the performance of a configurable number of apps in a > configurable number of queues. It will also test the performance of a cluster > that has many queues but only a portion of them are active. > {code:title=For example:} > $ mvn test > -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ > -DRunCapacitySchedulerPerfTests=true > -DNumberOfQueues=100 \ > -DNumberOfApplications=200 \ > -DPercentActiveQueues=100 > {code} > - Parameters: > -- RunCapacitySchedulerPerfTests=true: > Needed in order to trigger the test > -- NumberOfQueues > Configurable number of queues > -- NumberOfApplications > Total number of apps to run in the whole cluster, distributed evenly across > all queues > -- PercentActiveQueues > Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-9894: - Attachment: (was: YARN-9894.001.patch) > CapacitySchedulerPerf test for measuring hundreds of apps in a large number > of queues. > -- > > Key: YARN-9894 > URL: https://issues.apache.org/jira/browse/YARN-9894 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, test >Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > I have developed a unit test based on the existing TestCapacitySchedulerPerf > tests that will measure the performance of a configurable number of apps in a > configurable number of queues. It will also test the performance of a cluster > that has many queues but only a portion of them are active. > {code:title=For example:} > $ mvn test > -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ > -DRunCapacitySchedulerPerfTests=true > -DNumberOfQueues=100 \ > -DNumberOfApplications=200 \ > -DPercentActiveQueues=100 > {code} > - Parameters: > -- RunCapacitySchedulerPerfTests=true: > Needed in order to trigger the test > -- NumberOfQueues > Configurable number of queues > -- NumberOfApplications > Total number of apps to run in the whole cluster, distributed evenly across > all queues > -- PercentActiveQueues > Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949790#comment-16949790 ] Eric Badger commented on YARN-9884: --- Thanks for the prompt reviews, everyone! Patch 003 combines the util and docker enum error code lists together into a single one in util.h. I left the old error codes that aren't used anymore to keep the diff smaller, but I could remove those if you're in favor. We still have about 50 error codes that we can add until we go over the 128 boundary. > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch, > YARN-9884.003.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9884: -- Attachment: YARN-9884.003.patch > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch, > YARN-9884.003.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949769#comment-16949769 ] David Mollitor commented on YARN-9863: -- [~szegedim] Thank you for your feedback. The background here is that I am working with a large cluster that has one job in particular that is crushing it. This one job is required to localize many resources, of varying file sizes, for the job to complete. As I understand YARN, when a job is submitted to the cluster, a list of files to localize is sent to each NodeManager involved in the job. In this case, all nodes are involved. All NodeManagers receive a carbon copy of the list of files from the ResourceManager (or maybe it's the 'yarn' client?). That is, they all have the same list, with the same ordering. The NodeManager then iterate through the list and request that each file be localized. So, it would seem to me that all of the NodeManagers would request from HDFS file1, file2, file3, ... This would have a stampeding affect on the HDFS DataNodes. I am familiar with {{mapreduce.client.submit.file.replication}}. I understand that this is used to pump-up the replication of the submitted files so that they are available on more DataNodes. However, the way that it works, as I understand it, is that the file is first written to the HDFS cluster with the default replication (usually 3), and then the client requests that the file be replicated up to the final size in a separate request (setrep). This replication process happens asynchronously. If the {{mapreduce.client.submit.file.replication}} is set to 10, for example, the job may be submitted and finished before the file actually achieves a final replication of 10. This becomes exacerbated on larger clusters. If a cluster has 1,000 nodes, the recommended value of {{mapreduce.client.submit.file.replication}} is sqrt(1000) or ~32. The default number of connections each DataNode can support is 10 ({{dfs.datanode.handler.count}}). So, even if the desired replication is achieved, that is 32 x 10 connections = 320 connections supported at once. In a cluster with 1,000 nodes, that is going to stall. By simply randomizing the list, the load can be spread across many sets of 32 nodes and better support this scenario. For your questions: # I'm not sure how HDFS would manage this. The requests are generated by the NodeManagers and the HDFS cluster is simply serving. They have no way to randomize the requests. # SecureRandom. This is not a secure operation. It only requires a fast and pretty-good randomization of the list to spread the load # I believe that the parallel nature of the localization is configurable with {{yarn.nodemanager.localizer.fetch.thread-count}} (default 4), but I believe that the requests are submitted to a work-queue in order, so there will still be some level of trampling, especially if there are more than 4 files to localize (as is this case with the scenario I am reviewing) > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949707#comment-16949707 ] Miklos Szegedi commented on YARN-9863: -- [~belugabehr], could you explain the motivation for this change a bit more? AFAIK the order is better to be decided in HDFS. Also, once you use a random number, why do not you use SecureRandom? My third question is whether localization is running in parallel in which case the order does not matter so much. All in all my experience with YARN and localization suggests that if you have a bottleneck on HDFS, you would rather just do a suitable replica increase in HDFS even if it is temporary. HDFS is much better in doing replicas for localization, since it can do streaming avoiding any bottlenecks. Then the localization goes to the local instance, making it practically painless. > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9863) Randomize List of Resources to Localize
[ https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949707#comment-16949707 ] Miklos Szegedi edited comment on YARN-9863 at 10/11/19 6:39 PM: [~belugabehr], thank you for the patch. Could you explain the motivation for this change a bit more? AFAIK the order is better to be decided in HDFS. Also, once you use a random number, why do not you use SecureRandom? My third question is whether localization is running in parallel in which case the order does not matter so much. All in all my experience with YARN and localization suggests that if you have a bottleneck on HDFS, you would rather just do a suitable replica increase in HDFS even if it is temporary. HDFS is much better in doing replicas for localization, since it can do streaming avoiding any bottlenecks. Then the localization goes to the local instance, making it practically painless. was (Author: szegedim): [~belugabehr], could you explain the motivation for this change a bit more? AFAIK the order is better to be decided in HDFS. Also, once you use a random number, why do not you use SecureRandom? My third question is whether localization is running in parallel in which case the order does not matter so much. All in all my experience with YARN and localization suggests that if you have a bottleneck on HDFS, you would rather just do a suitable replica increase in HDFS even if it is temporary. HDFS is much better in doing replicas for localization, since it can do streaming avoiding any bottlenecks. Then the localization goes to the local instance, making it practically painless. > Randomize List of Resources to Localize > --- > > Key: YARN-9863 > URL: https://issues.apache.org/jira/browse/YARN-9863 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: YARN-9863.1.patch, YARN-9863.2.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java > Add a new parameter to {{LocalResourceBuilder}} that allows the list of > resources to be shuffled randomly. This will allow the Localizer to spread > the load of requests so that not all of the NodeManagers are requesting to > localize the same files, in the same order, from the same DataNodes, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949703#comment-16949703 ] Craig Condit commented on YARN-9884: [~ebadger], this looks pretty good for the most part. However, there's one potential big problem... exit codes outside of the range 0-127 tend to be misinterpreted by shells and other tooling. Have we verified that the upper codes are being interpreted properly? Most kernel wait() function variations truncate exit codes to 8 bits, and shells treat them as signed, where negative values indicate death by signal -- the infamous 143 exit code in Hadoop is really SIGTERM (15) + 128. > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949691#comment-16949691 ] Jim Brennan commented on YARN-9884: --- [~ebadger] good job on the re-factoring. This looks pretty good to me. I was going to comment that there are a few of the DOCKER related enum values that are no longer used, like INVALID_DOCKER_RO_MOUNT, and those should be removed. Also, I think all DOCKER-specific codes should have DOCKER in the name. I agree with [~eyang] that a single list would be even better. > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user
[ https://issues.apache.org/jira/browse/YARN-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949690#comment-16949690 ] Eric Payne commented on YARN-9887: -- The Capacity Scheduler does have a concept of {color:#22}Max Applications Per User{color} (per queue). While it is not directly configurable, it is based on the following: (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) Each of the above are configurable per queue. > Capacity scheduler: add support for limiting maxRunningApps per user > > > Key: YARN-9887 > URL: https://issues.apache.org/jira/browse/YARN-9887 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Priority: Major > > Fair Scheduler supports limiting the number of applications that a particular > user can submit: > {noformat} > > 10 > > {noformat} > Capacity Scheduler does not have an exact equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9885) Container allocation when queue usage is below MIN guarantees
[ https://issues.apache.org/jira/browse/YARN-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Golash reassigned YARN-9885: - Assignee: Prashant Golash > Container allocation when queue usage is below MIN guarantees > - > > Key: YARN-9885 > URL: https://issues.apache.org/jira/browse/YARN-9885 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Minor > > Filing this JIRA to calculate the time spend in container allocation when > queue usage is below min (during the whole time for the container). > Customers generally ask YARN SLA for container allocation when their queue > usage is below min. I have some implementation in my mind but I want to > confirm if from the community if this would be a helpful feature or if this > is already implemented? > > cc [~leftnoteasy] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9895) Feature flag to Disable delay scheduling
[ https://issues.apache.org/jira/browse/YARN-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Golash reassigned YARN-9895: - Assignee: Prashant Golash > Feature flag to Disable delay scheduling > > > Key: YARN-9895 > URL: https://issues.apache.org/jira/browse/YARN-9895 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Major > > In many YARN clusters, there is no colocation of storage and compute. In such > cases, we may not need delay scheduling. > > I think it would be good to provide an option to disable delay scheduling and > accordingly change the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9895) Feature flag to Disable delay scheduling
Prashant Golash created YARN-9895: - Summary: Feature flag to Disable delay scheduling Key: YARN-9895 URL: https://issues.apache.org/jira/browse/YARN-9895 Project: Hadoop YARN Issue Type: Improvement Reporter: Prashant Golash In many YARN clusters, there is no colocation of storage and compute. In such cases, we may not need delay scheduling. I think it would be good to provide an option to disable delay scheduling and accordingly change the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned YARN-9894: Assignee: Eric Payne (was: Eric Payne) > CapacitySchedulerPerf test for measuring hundreds of apps in a large number > of queues. > -- > > Key: YARN-9894 > URL: https://issues.apache.org/jira/browse/YARN-9894 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, test >Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > I have developed a unit test based on the existing TestCapacitySchedulerPerf > tests that will measure the performance of a configurable number of apps in a > configurable number of queues. It will also test the performance of a cluster > that has many queues but only a portion of them are active. > {code:title=For example:} > $ mvn test > -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ > -DRunCapacitySchedulerPerfTests=true > -DNumberOfQueues=100 \ > -DNumberOfApplications=200 \ > -DPercentActiveQueues=100 > {code} > - Parameters: > -- RunCapacitySchedulerPerfTests=true: > Needed in order to trigger the test > -- NumberOfQueues > Configurable number of queues > -- NumberOfApplications > Total number of apps to run in the whole cluster, distributed evenly across > all queues > -- PercentActiveQueues > Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9014) runC container runtime
[ https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9014: -- Summary: runC container runtime (was: OCI/squashfs container runtime) > runC container runtime > -- > > Key: YARN-9014 > URL: https://issues.apache.org/jira/browse/YARN-9014 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jason Darrell Lowe >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: OciSquashfsRuntime.v001.pdf, > RuncContainerRuntime.v002.pdf > > > This JIRA tracks a YARN container runtime that supports running containers in > images built by Docker but the runtime does not use Docker directly, and > Docker does not have to be installed on the nodes. The runtime leverages the > [OCI runtime standard|https://github.com/opencontainers/runtime-spec] to > launch containers, so an OCI-compliant runtime like {{runc}} is required. > {{runc}} has the benefit of not requiring a daemon like {{dockerd}} to be > running in order to launch/control containers. > The layers comprising the Docker image are uploaded to HDFS as > [squashfs|http://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html] images, enabling > the runtime to efficiently download and execute directly on the compressed > layers. This saves image unpack time and space on the local disk. The image > layers, like other entries in the YARN distributed cache, can be spread > across the YARN local disks, increasing the available space for storing > container images on each node. > A design document will be posted shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
[ https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned YARN-9894: Assignee: Eric Payne > CapacitySchedulerPerf test for measuring hundreds of apps in a large number > of queues. > -- > > Key: YARN-9894 > URL: https://issues.apache.org/jira/browse/YARN-9894 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, test >Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > I have developed a unit test based on the existing TestCapacitySchedulerPerf > tests that will measure the performance of a configurable number of apps in a > configurable number of queues. It will also test the performance of a cluster > that has many queues but only a portion of them are active. > {code:title=For example:} > $ mvn test > -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ > -DRunCapacitySchedulerPerfTests=true > -DNumberOfQueues=100 \ > -DNumberOfApplications=200 \ > -DPercentActiveQueues=100 > {code} > - Parameters: > -- RunCapacitySchedulerPerfTests=true: > Needed in order to trigger the test > -- NumberOfQueues > Configurable number of queues > -- NumberOfApplications > Total number of apps to run in the whole cluster, distributed evenly across > all queues > -- PercentActiveQueues > Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.
Eric Payne created YARN-9894: Summary: CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues. Key: YARN-9894 URL: https://issues.apache.org/jira/browse/YARN-9894 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler, test Affects Versions: 3.1.3, 3.2.1, 2.8.5, 2.9.2 Reporter: Eric Payne I have developed a unit test based on the existing TestCapacitySchedulerPerf tests that will measure the performance of a configurable number of apps in a configurable number of queues. It will also test the performance of a cluster that has many queues but only a portion of them are active. {code:title=For example:} $ mvn test -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \ -DRunCapacitySchedulerPerfTests=true -DNumberOfQueues=100 \ -DNumberOfApplications=200 \ -DPercentActiveQueues=100 {code} - Parameters: -- RunCapacitySchedulerPerfTests=true: Needed in order to trigger the test -- NumberOfQueues Configurable number of queues -- NumberOfApplications Total number of apps to run in the whole cluster, distributed evenly across all queues -- PercentActiveQueues Percentage of the queues that contain active applications -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949576#comment-16949576 ] Hadoop QA commented on YARN-8453: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 56s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 22s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 54s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 | | JIRA Issue | YARN-8453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 587486dec1d1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 626a48d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24966/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24966/testReport/ | | Max. process+thread count | 761 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (YARN-9882) QueueMetrics not coming in Capacity Scheduler with Node Label Configuration
[ https://issues.apache.org/jira/browse/YARN-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949573#comment-16949573 ] Eric Payne commented on YARN-9882: -- [~gaurav.suman], for the sake of legacy, the metrics outside of the sections labelled "...ByPartition" only reflect the resource usage of the default partition. For each partition, these metrics are included in the "...ByPartition" sections. If one wants the sum of all resources in all partitions, it is necessary to sum the metrics for each partition. The history of this is in YARN-6467 and others referenced there. There are currently problems with the accuracy of all of these metrics. They are being worked by [~rmanikandan] in the following JIRAs: YARN-6492 YARN-9767 YARN-9773 > QueueMetrics not coming in Capacity Scheduler with Node Label Configuration > --- > > Key: YARN-9882 > URL: https://issues.apache.org/jira/browse/YARN-9882 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, metrics, scheduler >Reporter: Gaurav Suman >Priority: Major > > I am having a capacity scheduler setup with two queues - "low-priority", > "regular-priority". There are two node-labels "low" and "regular". > low-priority queue has 100% access to "low" node-label and regular-priority > queue has 100% access to "regular" node label. > The yarn ui capacity scheduler configuration - > [https://i.stack.imgur.com/gOARn.png] > When i see the QueueMetrics emitted by queue "low-priority" and > "regular-priority" in (http://rm-ip:port/jmx), then it shows correct values > of availableMB and availableVCores, pendingMB=0 etc. but when I submit a job > to any queue, there is no update in jmx metrics like pendingMB, > pendingVcores, availableMB, availableVCores etc. only AppsRunning, > ActiveApplications etc. are getting updated. The pendingMB, pendingVcores > remains always 0 and there is no changes in availableMB, availableVcores, > appsRunning and activeApplications shows correct value as 1. Not able to find > why the metrics is not getting updated after job submission. > The issue comes only when node-label is enabled. When node-label is disabled > and only queue is used everything works fine. > The capacity scheduler configuration(capacity-scheduler.xml): > {code:java} > > > yarn.scheduler.capacity.maximum-applications > 5000 > > > yarn.scheduler.capacity.maximum-am-resource-percent > 0.2 > > > yarn.scheduler.capacity.resource-calculator > > org.apache.hadoop.yarn.util.resource.DominantResourceCalculator > > The ResourceCalculator implementation to be used to compare > Resources in the scheduler. > The default i.e. DefaultResourceCalculator only uses Memory while > DominantResourceCalculator uses dominant-resource to compare > multi-dimensional resources such as Memory, CPU etc. > > > > yarn.scheduler.capacity.root.queues > low-priority,regular-priority > > > yarn.scheduler.capacity.root.capacity > 100 > > > yarn.scheduler.capacity.root.maximum-capacity > 100 > > > yarn.scheduler.capacity.root.accessible-node-labels > * > > > > yarn.scheduler.capacity.root.accessible-node-labels.regular.capacity > 100 > > > > yarn.scheduler.capacity.root.accessible-node-labels.regular.maximum-capacity > 100 > > > > yarn.scheduler.capacity.root.accessible-node-labels.low.capacity > 100 > > > > yarn.scheduler.capacity.root.accessible-node-labels.low.maximum-capacity > 100 > > > yarn.scheduler.capacity.root.default.state > RUNNING > > The state of the default queue. State can be one of RUNNING or > STOPPED. > > > > yarn.scheduler.capacity.root.default.acl_submit_applications > * > > > yarn.scheduler.capacity.root.default.acl_administer_queue > * > > The ACL of who can administer jobs on the default queue. > > > > yarn.scheduler.capacity.node-locality-delay > 40 > > > yarn.scheduler.capacity.queue-mappings-override.enable > false > > > yarn.scheduler.capacity.root.low-priority.capacity > 50 > > > yarn.scheduler.capacity.root.low-priority.maximum-capacity > 100 > > > yarn.scheduler.capacity.root.low-priority.ordering-policy > fair > > > > yarn.scheduler.capacity.root.low-priority.accessible-node-labels > low > > > > yarn.scheduler.capacity.root.low-priority.default-node-label-expression > low > > > > yarn.scheduler.capacity.root.low-priority.accessible-node-labels.low.capacity > 100 > > > > yarn.scheduler.capacity.root.low-priority.accessible-node-labels.low.maximum-capacity > 100 > > >
[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949570#comment-16949570 ] Hadoop QA commented on YARN-5106: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 4s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 27 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 57s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 11 new + 278 unchanged - 52 fixed = 289 total (was 330) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 31s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 21s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}211m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-5106 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982771/YARN-5106.012.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d45762b6b02e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ec86f42 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949546#comment-16949546 ] Hadoop QA commented on YARN-8453: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 10s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 39s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 | | JIRA Issue | YARN-8453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6baa1b98f753 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 626a48d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24965/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24965/testReport/ | | Max. process+thread count | 781 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949524#comment-16949524 ] Hadoop QA commented on YARN-8453: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 13s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 56s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}139m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 | | JIRA Issue | YARN-8453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6779bbc1f636 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / 626a48d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24964/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24964/testReport/ | | Max. process+thread count | 789 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (YARN-9884) Make container-executor mount logic modular
[ https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949497#comment-16949497 ] Eric Badger commented on YARN-9884: --- bq. I am wondering if we need to combine the list to prevent future conflicts of the same name? I am fine with doing that. I refactored the error codes in a way that I thought would be the least invasive to how docker currently handles error codes. But having multiple error code lists is a little bit confusing because there could be collisions with error values and you won't always know whether your error came from a generic error code enum or a docker-specific one. Something I ran into while testing was that the docker daemon will return codes in the 100 range. I had initially placed all of the docker error codes in the 100 range, but then got an error 127 from the docker daemon and realized that that would be incorrectly parsed as a docker error code. So even though the docker daemon was passing back 127, the error message you would be given would be something completely unrelated to the actual error. So all in all, I am in favor of combining the lists. I'd be happy to include that in this patch or put up a separate patch to do it. > Make container-executor mount logic modular > --- > > Key: YARN-9884 > URL: https://issues.apache.org/jira/browse/YARN-9884 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9884.001.patch, YARN-9884.002.patch > > > The current mount logic in the container-executor is interwined with docker. > To avoid duplicating code between docker and runc, the code should be > refactored so that both runtimes can use the same common code when possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949452#comment-16949452 ] Szilard Nemeth commented on YARN-9840: -- Hi [~pbacsko]! That's fine, thanks for the answer. Do you agree that we need addition to the CS documentation as well? Thanks! > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch, > YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949450#comment-16949450 ] Peter Bacsko commented on YARN-9840: [~snemeth] although you didn't ask me, I examined the group mapping code in more detail, the answer is yes: index 0 is the primary group. Basically it's coming from the output of {{id -GN}} if {{ShellBasedUnixGroupsMapping}} if used, so the first group is always the primary. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch, > YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949447#comment-16949447 ] Hadoop QA commented on YARN-9699: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 28m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 4 new + 137 unchanged - 7 fixed = 141 total (was 144) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 31s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 10s{color} | {color:red} The patch has 2 ill-formed XML file(s). {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 13s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}182m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | XML | Parsing Error(s): | | |
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949444#comment-16949444 ] Szilard Nemeth commented on YARN-9840: -- Hi [~maniraj...@gmail.com]! Just a quick question: In the added code in UserGroupMappingPlacementRule#getPlacementForUser: You are starting the loop from index 1. I guess ithis is because the primary group is at the 0th index and all the seconday groups are from higher indices. Is this statement true? Can you make this straightforward with some code comment? As this is kind of a new feature in CS, can you modify the documentation of CS as well? Thanks! > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch, > YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9893) Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity setting
Peter Bacsko created YARN-9893: -- Summary: Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity setting Key: YARN-9893 URL: https://issues.apache.org/jira/browse/YARN-9893 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler Reporter: Peter Bacsko Capacity Scheduler does not support two percentage values for leaf queue capacity and maximum-capacity settings. So, you can't do something like this: {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=50.0%, vcores=50.0%}} On top of that, it's not even possible to define absolute resources: {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=16384, vcores=8}} Only a single percentage value is accepted. This makes it nearly impossible to properly convert a similar setting from Fair Scheduler, where such a configuration is valid and accepted ({{}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949412#comment-16949412 ] Hudson commented on YARN-8453: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17525 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17525/]) YARN-8453. Additional Unit tests to verify queue limit and max-limit (snemeth: rev ec86f42e40ec57ea5d515c1207161fcaf2c770e1) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerWithMultiResourceTypes.java > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8453.001.patch, YARN-8453.002.patch, > YARN-8453.branch-3.1.001.patch, YARN-8453.branch-3.2.001.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8453: - Attachment: YARN-8453.branch-3.1.001.patch > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8453.001.patch, YARN-8453.002.patch, > YARN-8453.branch-3.1.001.patch, YARN-8453.branch-3.2.001.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
[ https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jenny updated YARN-9881: Attachment: 1.png 2.png 3.png Description: The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, change Item memory‘s datatype from int to long. 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. 2. change Fair Scheduler API's item [memory]'s dataType from int to long. Summary: Change Cluster_Scheduler_API's Item memory‘s datatype from int to long. (was: In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.) > Change Cluster_Scheduler_API's Item memory‘s datatype from int to long. > > > Key: YARN-9881 > URL: https://issues.apache.org/jira/browse/YARN-9881 > Project: Hadoop YARN > Issue Type: Bug > Components: docs, documentation, yarn >Affects Versions: 3.1.1, 3.2.1 >Reporter: jenny >Priority: Major > Attachments: 1.png, 2.png, 3.png > > > The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, > In > hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, > change Item memory‘s datatype from int to long. > 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. > 2. change Fair Scheduler API's item [memory]'s dataType from int to long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949399#comment-16949399 ] Szilard Nemeth commented on YARN-8453: -- Hi [~adam.antal]! Thanks for the patch, just committed to trunk! Added branch-3.2 patch as it applied cleanly and waiting for jenkins to pick it up. After that, I will add branch-3.1 patch as well! > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8453.001.patch, YARN-8453.002.patch, > YARN-8453.branch-3.2.001.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9892) Capacity scheduler: support DRF ordering policy on queue level
Peter Bacsko created YARN-9892: -- Summary: Capacity scheduler: support DRF ordering policy on queue level Key: YARN-9892 URL: https://issues.apache.org/jira/browse/YARN-9892 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler Reporter: Peter Bacsko Capacity scheduler does not support DRF (Dominant Resource Fairness) ordering policy on queue level. Only "fifo" and "fair" are accepted for {{yarn.scheduler.capacity..ordering-policy}}. DRF can only be used globally if {{yarn.scheduler.capacity.resource-calculator}} is set to DominantResourceCalculator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8453: - Attachment: YARN-8453.branch-3.2.001.patch > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8453.001.patch, YARN-8453.002.patch, > YARN-8453.branch-3.2.001.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9888) Capacity scheduler: add support for default maxRunningApps limit per user
[ https://issues.apache.org/jira/browse/YARN-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9888: --- Component/s: capacity scheduler > Capacity scheduler: add support for default maxRunningApps limit per user > - > > Key: YARN-9888 > URL: https://issues.apache.org/jira/browse/YARN-9888 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Priority: Major > > Fair scheduler has the setting {{}} which limits how many > running applications each user can have. > Capacity scheduler lacks this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user
[ https://issues.apache.org/jira/browse/YARN-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9887: --- Component/s: capacity scheduler > Capacity scheduler: add support for limiting maxRunningApps per user > > > Key: YARN-9887 > URL: https://issues.apache.org/jira/browse/YARN-9887 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Priority: Major > > Fair Scheduler supports limiting the number of applications that a particular > user can submit: > {noformat} > > 10 > > {noformat} > Capacity Scheduler does not have an exact equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-8453: - Fix Version/s: 3.3.0 > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8453.001.patch, YARN-8453.002.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9891) Capacity scheduler: enhance capacity / maximum-capacity setting
[ https://issues.apache.org/jira/browse/YARN-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9891: --- Component/s: capacity scheduler > Capacity scheduler: enhance capacity / maximum-capacity setting > --- > > Key: YARN-9891 > URL: https://issues.apache.org/jira/browse/YARN-9891 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Priority: Major > > Capacity Scheduler does not support two percentage values for capacity and > maximum-capacity settings. So, you can't do something like this: > {{yarn.scheduler.capacity.root.users.john.maximum-capacity=memory-mb=50.0%, > vcores=50.0%}} > It's possible to use absolute resources, but not two separate percentages > (which expresses capacity as a percentage of the overall cluster resource). > Such a configuration is accepted in Fair Scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9891) Capacity scheduler: enhance capacity / maximum-capacity setting
Peter Bacsko created YARN-9891: -- Summary: Capacity scheduler: enhance capacity / maximum-capacity setting Key: YARN-9891 URL: https://issues.apache.org/jira/browse/YARN-9891 Project: Hadoop YARN Issue Type: Sub-task Reporter: Peter Bacsko Capacity Scheduler does not support two percentage values for capacity and maximum-capacity settings. So, you can't do something like this: {{yarn.scheduler.capacity.root.users.john.maximum-capacity=memory-mb=50.0%, vcores=50.0%}} It's possible to use absolute resources, but not two separate percentages (which expresses capacity as a percentage of the overall cluster resource). Such a configuration is accepted in Fair Scheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9881) In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.
[ https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jenny updated YARN-9881: Description: (was: The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, change Item memory‘s datatype from int to long. 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. 2. change Fair Scheduler API's item [memory]'s dataType from int to long.) Summary: In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect. (was: Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.) > In YARN ui2 attempts tab, The running Application Attempt's Container's > ElapsedTime is incorrect. > - > > Key: YARN-9881 > URL: https://issues.apache.org/jira/browse/YARN-9881 > Project: Hadoop YARN > Issue Type: Bug > Components: docs, documentation, yarn >Affects Versions: 3.1.1, 3.2.1 >Reporter: jenny >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html
[ https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949398#comment-16949398 ] Hudson commented on YARN-9836: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17524 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17524/]) YARN-9836. General usability improvements in showSimulationTrace.html. (snemeth: rev 62b5cefaeaa9cccd8d2de8eaff75d0e32e87f54d) * (edit) hadoop-tools/hadoop-sls/src/main/html/showSimulationTrace.html > General usability improvements in showSimulationTrace.html > -- > > Key: YARN-9836 > URL: https://issues.apache.org/jira/browse/YARN-9836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9836.001.patch, YARN-9836.002.patch, > YARN-9836.003.patch > > > There are some small usability improvements that can be made for the offline > analysis page (showSimulationTrace.html): > - empty divs can be hidden until no data is displayed > - the site can be refactored to be responsive given that bootstrap is already > available as third party library > - there's no proper error handling in the site (e.g. a JSON is malformed and > similar cases) which is really a big problem > - there's no indentation in the raw html file which makes supportability even > worse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
[ https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jenny updated YARN-9881: Description: The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, change Item memory‘s datatype from int to long. 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. 2. change Fair Scheduler API's item [memory]'s dataType from int to long. was: The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, change Item memory‘s datatype from int to long. 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. 2. change Fair Scheduler API's item [memory]'s dataType from int to long. !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png! !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png! !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png! > Change Cluster_Scheduler_API's Item memory‘s datatype from int to long. > --- > > Key: YARN-9881 > URL: https://issues.apache.org/jira/browse/YARN-9881 > Project: Hadoop YARN > Issue Type: Bug > Components: docs, documentation, yarn >Affects Versions: 3.1.1, 3.2.1 >Reporter: jenny >Priority: Major > > The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, > In > hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, > change Item memory‘s datatype from int to long. > 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. > 2. change Fair Scheduler API's item [memory]'s dataType from int to long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
[ https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jenny updated YARN-9881: Component/s: (was: yarn-ui-v2) yarn documentation docs Description: The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, change Item memory‘s datatype from int to long. 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. 2. change Fair Scheduler API's item [memory]'s dataType from int to long. !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png! !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png! !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png! Summary: Change Cluster_Scheduler_API's Item memory‘s datatype from int to long. (was: In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.) > Change Cluster_Scheduler_API's Item memory‘s datatype from int to long. > --- > > Key: YARN-9881 > URL: https://issues.apache.org/jira/browse/YARN-9881 > Project: Hadoop YARN > Issue Type: Bug > Components: docs, documentation, yarn >Affects Versions: 3.1.1, 3.2.1 >Reporter: jenny >Priority: Major > > The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, > In > hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, > change Item memory‘s datatype from int to long. > 1.change Capacity Scheduler API's item [memory]'s dataType from int to long. > 2. change Fair Scheduler API's item [memory]'s dataType from int to long. > !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png! > > !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png! > > !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9836) General usability improvements in showSimulationTrace.html
[ https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9836: - Fix Version/s: 3.3.0 > General usability improvements in showSimulationTrace.html > -- > > Key: YARN-9836 > URL: https://issues.apache.org/jira/browse/YARN-9836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9836.001.patch, YARN-9836.002.patch, > YARN-9836.003.patch > > > There are some small usability improvements that can be made for the offline > analysis page (showSimulationTrace.html): > - empty divs can be hidden until no data is displayed > - the site can be refactored to be responsive given that bootstrap is already > available as third party library > - there's no proper error handling in the site (e.g. a JSON is malformed and > similar cases) which is really a big problem > - there's no indentation in the raw html file which makes supportability even > worse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9890) [UI2] Add Application tag to the app table and app detail page.
Kinga Marton created YARN-9890: -- Summary: [UI2] Add Application tag to the app table and app detail page. Key: YARN-9890 URL: https://issues.apache.org/jira/browse/YARN-9890 Project: Hadoop YARN Issue Type: Sub-task Reporter: Kinga Marton Assignee: Kinga Marton Right now AFAIK there is no possibility to filter the applications based on the application tag in the UI. Adding this new column to the app table will make this filtering possible as well. >From the UI2 this information is missing from the application detail page as >well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Siegl updated YARN-5106: --- Attachment: YARN-5106.012.patch > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Zoltan Siegl >Priority: Major > Labels: newbie++ > Attachments: YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, > YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, > YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, > YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, > YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, > YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, > YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, > YARN-5106.011.patch, YARN-5106.012.patch > > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html
[ https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949389#comment-16949389 ] Szilard Nemeth commented on YARN-9836: -- Thanks [~adam.antal] for the patch, commited to trunk! Thanks [~shuzirra] for the review! [~adam.antal]: Would you please check if we need these changes in branch-3.2 / branch-3.1 as well? Please especially check if we have the same set of JS dependencies. Thanks! > General usability improvements in showSimulationTrace.html > -- > > Key: YARN-9836 > URL: https://issues.apache.org/jira/browse/YARN-9836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9836.001.patch, YARN-9836.002.patch, > YARN-9836.003.patch > > > There are some small usability improvements that can be made for the offline > analysis page (showSimulationTrace.html): > - empty divs can be hidden until no data is displayed > - the site can be refactored to be responsive given that bootstrap is already > available as third party library > - there's no proper error handling in the site (e.g. a JSON is malformed and > similar cases) which is really a big problem > - there's no indentation in the raw html file which makes supportability even > worse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9889) [UI] Add Application Tag column to RM All Applications table
Kinga Marton created YARN-9889: -- Summary: [UI] Add Application Tag column to RM All Applications table Key: YARN-9889 URL: https://issues.apache.org/jira/browse/YARN-9889 Project: Hadoop YARN Issue Type: Sub-task Reporter: Kinga Marton Assignee: Kinga Marton Right now AFAIK there is no possibility to filter the applications based on the application tag in the UI. Adding this new column to the app table will make this filtering possible as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9888) Capacity scheduler: add support for default maxRunningApps limit per user
Peter Bacsko created YARN-9888: -- Summary: Capacity scheduler: add support for default maxRunningApps limit per user Key: YARN-9888 URL: https://issues.apache.org/jira/browse/YARN-9888 Project: Hadoop YARN Issue Type: Sub-task Reporter: Peter Bacsko Fair scheduler has the setting {{}} which limits how many running applications each user can have. Capacity scheduler lacks this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user
Peter Bacsko created YARN-9887: -- Summary: Capacity scheduler: add support for limiting maxRunningApps per user Key: YARN-9887 URL: https://issues.apache.org/jira/browse/YARN-9887 Project: Hadoop YARN Issue Type: Sub-task Reporter: Peter Bacsko Fair Scheduler supports limiting the number of applications that a particular user can submit: {noformat} 10 {noformat} Capacity Scheduler does not have an exact equivalent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949358#comment-16949358 ] Peter Bacsko commented on YARN-9841: +1 (non-binding) from me. Let's wait for YARN-9840 and then try to apply this one again. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, > YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9886) Queue mapping based on userid passed through application tag
Kinga Marton created YARN-9886: -- Summary: Queue mapping based on userid passed through application tag Key: YARN-9886 URL: https://issues.apache.org/jira/browse/YARN-9886 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Kinga Marton Assignee: Kinga Marton There are situations when the real submitting user differs from the user what arrives to YARN. For example in case of a Hive application when Hive impersonation is turned off, the hive queries will run as Hive user and the mapping is done based on this username. Unfortunately in this case YARN doesn't have any information about the real user and there are cases when the customer may want to map this applications to the real submitting user's queue instead of the Hive one. For this cases if they would pass the username in the application tag we may read it and use that one during the queue mapping, if that user has rights to run on the real user's queue. [~sunilg] please correct me if I missed something. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage
[ https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949330#comment-16949330 ] Tao Yang edited comment on YARN-9838 at 10/11/19 10:02 AM: --- Thanks [~jiulongZhu] for fixing this issue. The patch LGTM in general, some minor suggestions for the patch: * check-style warnings need to be fixed, after that, you can run "dev-support/bin/test-patch /path/to/my.patch" to confirm. * The indentation of updated log need to be adjusted and useless deletion of a blank line should be reverted in LeafQueue. * The annotation "sync ResourceUsageByLabel ResourceUsageByUser and numContainer" can be removed since it seems unnecessary to add details here. * As for UT, you can remove before-fixed block and just keep the correct verification. Moreover, I think it's better to remove the method annotation("//YARN-9838") since we can find the source easily by git, and the annotation style "/\*\* \*/" often used for class or method, it's better to use "//" or "/\* \*/" in the method. was (Author: tao yang): Thanks [~jiulongZhu] for fixing this issue. The patch is LGTM in general, some minor suggestions for the patch: * check-style warnings need to be fixed, after that, you can run "dev-support/bin/test-patch /path/to/my.patch" to confirm. * The indentation of updated log need to be adjusted and useless deletion of a blank line should be reverted in LeafQueue. * The annotation "sync ResourceUsageByLabel ResourceUsageByUser and numContainer" can be removed since it seems unnecessary to add details here. * As for UT, you can remove before-fixed block and just keep the correct verification. Moreover, I think it's better to remove "//YARN-9838" since we can find the source easily by git, and the annotation style "/** */" often used for class or method, it's better to use "//" or "/* */" in the method. > Using the CapacityScheduler,Apply "movetoqueue" on the application which CS > reserved containers for,will cause "Num Container" and "Used Resource" in > ResourceUsage metrics error > -- > > Key: YARN-9838 > URL: https://issues.apache.org/jira/browse/YARN-9838 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: jiulongzhu >Priority: Critical > Labels: patch > Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, > YARN-9838.0001.patch > > > In some clusters of ours, we are seeing "Used Resource","Used > Capacity","Absolute Used Capacity" and "Num Container" is positive or > negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In > extreme cases, apps couldn't be submitted to the queue that is actually idle > but the "Used Resource" is far more than zero, just like "Container Leak". > Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used > Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and > "Num Container" use the "numContainer" value kept by LeafQueue.And > AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will > change the state value of "numContainer" and "Used". Secondly, by comparing > the values numContainer and ResourceUsageByLabel and QueueMetrics > changed(#allocateContainer and #releaseContainer) logic of applications with > and without "movetoqueue",i found that moving the reservedContainers didn't > modify the "numContainer" value in AbstractCSQueue and "used" value in > ResourceUsage when the application was moved from a queue to another queue. > The metric values changed logic of reservedContainers are allocated, > and moved from $FROM queue to $TO queue, and released.The degree of increase > and decrease is not conservative, the Resource allocated from $FROM queue and > release to $TO queue. > ||move reversedContainer||allocate||movetoqueue||release|| > |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the > same,$TO queue stay the same{color}|decrease in $TO queue| > |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM > queue stay the same,$TO queue stay the same{color}|decrease in $TO queue | > |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in > $TO queue|decrease in $TO queue| > The metric values changed logic of allocatedContainer(allocated, > acquired, running) are allocated, and movetoqueue, and released are > absolutely conservative. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metrics
[ https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9838: --- Issue Type: Bug (was: Improvement) > Using the CapacityScheduler,Apply "movetoqueue" on the application which CS > reserved containers for,will cause "Num Container" and "Used Resource" in > ResourceUsage metrics error > -- > > Key: YARN-9838 > URL: https://issues.apache.org/jira/browse/YARN-9838 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: jiulongzhu >Priority: Critical > Labels: patch > Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, > YARN-9838.0001.patch > > > In some clusters of ours, we are seeing "Used Resource","Used > Capacity","Absolute Used Capacity" and "Num Container" is positive or > negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In > extreme cases, apps couldn't be submitted to the queue that is actually idle > but the "Used Resource" is far more than zero, just like "Container Leak". > Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used > Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and > "Num Container" use the "numContainer" value kept by LeafQueue.And > AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will > change the state value of "numContainer" and "Used". Secondly, by comparing > the values numContainer and ResourceUsageByLabel and QueueMetrics > changed(#allocateContainer and #releaseContainer) logic of applications with > and without "movetoqueue",i found that moving the reservedContainers didn't > modify the "numContainer" value in AbstractCSQueue and "used" value in > ResourceUsage when the application was moved from a queue to another queue. > The metric values changed logic of reservedContainers are allocated, > and moved from $FROM queue to $TO queue, and released.The degree of increase > and decrease is not conservative, the Resource allocated from $FROM queue and > release to $TO queue. > ||move reversedContainer||allocate||movetoqueue||release|| > |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the > same,$TO queue stay the same{color}|decrease in $TO queue| > |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM > queue stay the same,$TO queue stay the same{color}|decrease in $TO queue | > |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in > $TO queue|decrease in $TO queue| > The metric values changed logic of allocatedContainer(allocated, > acquired, running) are allocated, and movetoqueue, and released are > absolutely conservative. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metrics
[ https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9838: --- Fix Version/s: (was: 2.7.3) > Using the CapacityScheduler,Apply "movetoqueue" on the application which CS > reserved containers for,will cause "Num Container" and "Used Resource" in > ResourceUsage metrics error > -- > > Key: YARN-9838 > URL: https://issues.apache.org/jira/browse/YARN-9838 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: jiulongzhu >Priority: Critical > Labels: patch > Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, > YARN-9838.0001.patch > > > In some clusters of ours, we are seeing "Used Resource","Used > Capacity","Absolute Used Capacity" and "Num Container" is positive or > negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In > extreme cases, apps couldn't be submitted to the queue that is actually idle > but the "Used Resource" is far more than zero, just like "Container Leak". > Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used > Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and > "Num Container" use the "numContainer" value kept by LeafQueue.And > AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will > change the state value of "numContainer" and "Used". Secondly, by comparing > the values numContainer and ResourceUsageByLabel and QueueMetrics > changed(#allocateContainer and #releaseContainer) logic of applications with > and without "movetoqueue",i found that moving the reservedContainers didn't > modify the "numContainer" value in AbstractCSQueue and "used" value in > ResourceUsage when the application was moved from a queue to another queue. > The metric values changed logic of reservedContainers are allocated, > and moved from $FROM queue to $TO queue, and released.The degree of increase > and decrease is not conservative, the Resource allocated from $FROM queue and > release to $TO queue. > ||move reversedContainer||allocate||movetoqueue||release|| > |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the > same,$TO queue stay the same{color}|decrease in $TO queue| > |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM > queue stay the same,$TO queue stay the same{color}|decrease in $TO queue | > |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in > $TO queue|decrease in $TO queue| > The metric values changed logic of allocatedContainer(allocated, > acquired, running) are allocated, and movetoqueue, and released are > absolutely conservative. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metri
[ https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949330#comment-16949330 ] Tao Yang commented on YARN-9838: Thanks [~jiulongZhu] for fixing this issue. The patch is LGTM in general, some minor suggestions for the patch: * check-style warnings need to be fixed, after that, you can run "dev-support/bin/test-patch /path/to/my.patch" to confirm. * The indentation of updated log need to be adjusted and useless deletion of a blank line should be reverted in LeafQueue. * The annotation "sync ResourceUsageByLabel ResourceUsageByUser and numContainer" can be removed since it seems unnecessary to add details here. * As for UT, you can remove before-fixed block and just keep the correct verification. Moreover, I think it's better to remove "//YARN-9838" since we can find the source easily by git, and the annotation style "/** */" often used for class or method, it's better to use "//" or "/* */" in the method. > Using the CapacityScheduler,Apply "movetoqueue" on the application which CS > reserved containers for,will cause "Num Container" and "Used Resource" in > ResourceUsage metrics error > -- > > Key: YARN-9838 > URL: https://issues.apache.org/jira/browse/YARN-9838 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.7.3 >Reporter: jiulongzhu >Priority: Critical > Labels: patch > Fix For: 2.7.3 > > Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, > YARN-9838.0001.patch > > > In some clusters of ours, we are seeing "Used Resource","Used > Capacity","Absolute Used Capacity" and "Num Container" is positive or > negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In > extreme cases, apps couldn't be submitted to the queue that is actually idle > but the "Used Resource" is far more than zero, just like "Container Leak". > Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used > Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and > "Num Container" use the "numContainer" value kept by LeafQueue.And > AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will > change the state value of "numContainer" and "Used". Secondly, by comparing > the values numContainer and ResourceUsageByLabel and QueueMetrics > changed(#allocateContainer and #releaseContainer) logic of applications with > and without "movetoqueue",i found that moving the reservedContainers didn't > modify the "numContainer" value in AbstractCSQueue and "used" value in > ResourceUsage when the application was moved from a queue to another queue. > The metric values changed logic of reservedContainers are allocated, > and moved from $FROM queue to $TO queue, and released.The degree of increase > and decrease is not conservative, the Resource allocated from $FROM queue and > release to $TO queue. > ||move reversedContainer||allocate||movetoqueue||release|| > |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the > same,$TO queue stay the same{color}|decrease in $TO queue| > |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM > queue stay the same,$TO queue stay the same{color}|decrease in $TO queue | > |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in > $TO queue|decrease in $TO queue| > The metric values changed logic of allocatedContainer(allocated, > acquired, running) are allocated, and movetoqueue, and released are > absolutely conservative. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949311#comment-16949311 ] Szilard Nemeth edited comment on YARN-9699 at 10/11/19 9:28 AM: Added a final patch (017) that fixes checkstyle/whitespace issues and adds a warning text, discussed above. +1 for the latest patch. [~sunilg]: Patch is ready for review as it has 1 non-binding and 1 binding +1s was (Author: snemeth): Added a final patch (017) that fixes checkstyle/whitespace issues and adds a warning text, discussed above. > [Phase 1] Migration tool that help to generate CS config based on FS config > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Peter Bacsko >Priority: Major > Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, > YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, > YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, > YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, > YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, > YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, > YARN-9699.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9699: - Attachment: YARN-9699-017.patch > [Phase 1] Migration tool that help to generate CS config based on FS config > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Peter Bacsko >Priority: Major > Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, > YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, > YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, > YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, > YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, > YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, > YARN-9699.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949311#comment-16949311 ] Szilard Nemeth commented on YARN-9699: -- Added a final patch (017) that fixes checkstyle/whitespace issues and adds a warning text, discussed above. > [Phase 1] Migration tool that help to generate CS config based on FS config > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Peter Bacsko >Priority: Major > Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, > YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, > YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, > YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, > YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, > YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, > YARN-9699.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9881) In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.
[ https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949305#comment-16949305 ] jenny commented on YARN-9881: - Hi [~Prabhu Joseph] , I will attach details, please don't close, thank you > In YARN ui2 attempts tab, The running Application Attempt's Container's > ElapsedTime is incorrect. > - > > Key: YARN-9881 > URL: https://issues.apache.org/jira/browse/YARN-9881 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.1, 3.2.1 >Reporter: jenny >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html
[ https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949285#comment-16949285 ] Gergely Pollak commented on YARN-9836: -- Thank you [~adam.antal]! LGTM + 1 (non-binding) > General usability improvements in showSimulationTrace.html > -- > > Key: YARN-9836 > URL: https://issues.apache.org/jira/browse/YARN-9836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler-load-simulator >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9836.001.patch, YARN-9836.002.patch, > YARN-9836.003.patch > > > There are some small usability improvements that can be made for the offline > analysis page (showSimulationTrace.html): > - empty divs can be hidden until no data is displayed > - the site can be refactored to be responsive given that bootstrap is already > available as third party library > - there's no proper error handling in the site (e.g. a JSON is malformed and > similar cases) which is really a big problem > - there's no indentation in the raw html file which makes supportability even > worse -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949277#comment-16949277 ] Gergely Pollak commented on YARN-9699: -- Hi [~snemeth] and [~pbacsko], thank you for the patch, LGTM+1 (non-binding) > [Phase 1] Migration tool that help to generate CS config based on FS config > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Peter Bacsko >Priority: Major > Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, > YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, > YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, > YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, > YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, > YARN-9699-016.patch, YARN-9699.001.patch, YARN-9699.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types
[ https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949261#comment-16949261 ] Szilard Nemeth commented on YARN-8453: -- Hi [~adam.antal]! Thanks for uploading a new patch, looking at it soon. In the meantime, could you please upload branch-3.2 / branch-3.1 patches? Thanks! > Additional Unit tests to verify queue limit and max-limit with multiple > resource types > --- > > Key: YARN-8453 > URL: https://issues.apache.org/jira/browse/YARN-8453 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.2 >Reporter: Sunil G >Assignee: Adam Antal >Priority: Major > Attachments: YARN-8453.001.patch, YARN-8453.002.patch > > > Post support of additional resource types other then CPU and Memory, it could > be possible that one such new resource is exhausted its quota on a given > queue. But other resources such as Memory / CPU is still there beyond its > guaranteed limit (under max-limit). Adding more units test to ensure we are > not starving such allocation requests -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9860) Enable service mode for Docker containers on YARN
[ https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949191#comment-16949191 ] Prabhu Joseph commented on YARN-9860: - Thanks [~eyang] and [~shaneku...@gmail.com]. > Enable service mode for Docker containers on YARN > - > > Key: YARN-9860 > URL: https://issues.apache.org/jira/browse/YARN-9860 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: Screen Shot 2019-10-09 at 11.27.19 AM.png, > YARN-9860-001.patch, YARN-9860-002.patch, YARN-9860-003.patch, > YARN-9860-004.patch, YARN-9860-005.patch, YARN-9860-006.patch, > YARN-9860-007.patch, YARN-9860-008.patch, YARN-9860-009.patch > > > This task is to add support to YARN for running Docker containers in "Service > Mode". > Service Mode - Run the container as defined by the image, but still allow for > injecting configuration. > Background: > Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as > defined in the image. However, still requires modification to official images > due to user propagation > User propagation is problematic for running a secure cluster with sssd > > Implementation: > Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true) > Must be requested at runtime - (example: > YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true) > Entrypoint mode is default enabled for this mode (If Service Mode is > requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set > to true) > Writable log mount will not be added - stdout logging may still work > with entrypoint mode - remove the writable bind mounts > User and groups will not be propagated (now: docker run --user nobody > --group-add=nobody , after: docker run ) > Read-only resources mounted at the file level, files get chmod 777, > parent directory only accessible by the run as user. > cc [~shaneku...@gmail.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9885) Container allocation when queue usage is below MIN guarantees
Prashant Golash created YARN-9885: - Summary: Container allocation when queue usage is below MIN guarantees Key: YARN-9885 URL: https://issues.apache.org/jira/browse/YARN-9885 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.2.1 Reporter: Prashant Golash Filing this JIRA to calculate the time spend in container allocation when queue usage is below min (during the whole time for the container). Customers generally ask YARN SLA for container allocation when their queue usage is below min. I have some implementation in my mind but I want to confirm if from the community if this would be a helpful feature or if this is already implemented? cc [~leftnoteasy] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.
[ https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949183#comment-16949183 ] Prashant Golash commented on YARN-9656: --- Thanks, [~wangda] for taking a look. Initial we thought of just keeping node "unhealthy" and extended our NMs to include these checks in NM health check scripts, but realized that this could result in a lot of unhealthy nodes (For e.g in our cluster), so we thought of adding intermediate stage "stressed" and control by the threshold at RM layer as well. I guess this may be specific to the environment and for upstream just configuring NM scripts should be enough. > Plugin to avoid scheduling jobs on node which are not in "schedulable" state, > but are healthy otherwise. > > > Key: YARN-9656 > URL: https://issues.apache.org/jira/browse/YARN-9656 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.9.1, 3.1.2 >Reporter: Prashant Golash >Assignee: Prashant Golash >Priority: Major > Attachments: 2.patch > > > Creating this Jira to get idea from the community if this is something > helpful which can be done in YARN. Some times the nodes go in a bad state for > e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if > CGroup is not enabled, nodes may be running very high on CPU and the jobs > scheduled on them will suffer. > > The idea is three-fold: > # Gather relevant metrics from node-managers and put in some form (for e.g. > exclude file). > # RM loads the files and put the nodes as part of the blacklist. > # Once the node becomes good, they can again be put in the whitelist. > Various optimizations can be done here, but I would like to understand if > this is something which could be helpful as an upstream feature in YARN. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org