[jira] [Commented] (YARN-9830) Improve ContainerAllocationExpirer it blocks scheduling

2019-10-11 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949904#comment-16949904
 ] 

Bibin Chundatt commented on YARN-9830:
--

[~sunil.gov...@gmail.com] Could you take a look



> Improve ContainerAllocationExpirer it blocks scheduling
> ---
>
> Key: YARN-9830
> URL: https://issues.apache.org/jira/browse/YARN-9830
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bibin Chundatt
>Priority: Critical
>  Labels: perfomance
> Attachments: YARN-9830.001.patch
>
>
> {quote}
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor.register(AbstractLivelinessMonitor.java:106)
> - waiting to lock <0x7fa348749550> (a 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:601)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$AcquiredTransition.transition(RMContainerImpl.java:592)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> - locked <0x7fc8852f8200> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:474)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is

2019-10-11 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890
 ] 

liusheng edited comment on YARN-9511 at 10/12/19 3:51 AM:
--

Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 
*TestAuxServices*, and will cause *2 Errors 9 Failures*, see:
{code:java}
Failures:
  TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0>
  TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but 
was:<0>
  TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> 
but was:<1>
  TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but 
was:<1>
  TestAuxServices.testCustomizedAuxServiceClassPath:416
  TestAuxServices.testManualReload:919 expected:<2> but was:<0>
  TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar 
is wrong.Should throw out exception.
  TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0>
  TestAuxServices.testValidAuxServiceName:698 Should receive the exception.
Errors:
  TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement
  TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote 
jar...
{code}
After debuging, I found all these issues are directly or indirectly related 
with files permissions.

there are two situations when running these tests:
 # *useManifest* enabled, when running tests with useManifest enabled, the 
tests will check and use the manifest file: 
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code}
this file and all its parents directories must not be writable by group or 
others, see:
{code:java}
private boolean checkManifestPermissions(FileStatus status) throws
IOException {
  if ((status.getPermission().toShort() & 0022) != 0) {
LOG.error("Manifest file and parents must not be writable by group or " +
"others. The current Permission of " + status.getPath() + " is " +
status.getPermission());
return false;
  }
  Path parent = status.getPath().getParent();
  if (parent == null) {
return true;
  }
  return checkManifestPermissions(manifestFS.getFileStatus(parent));
}{code}
 

 # *useManifest not* enabled,  when running tests with useManifest enabled, 
tests will use a *test-runjar.jar* file
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar
{code}
related code for checking its permission:
{code:java}
private Path maybeDownloadJars(String sName, String className, String
remoteFile, AuxServiceFile.TypeEnum type, Configuration conf)
throws IOException {
  // load AuxiliaryService from remote classpath
  FileContext localLFS = getLocalFileContext(conf);
  // create NM aux-service dir in NM localdir if it does not exist.
  Path nmAuxDir = dirsHandler.getLocalPathForWrite("."
  + Path.SEPARATOR + NM_AUX_SERVICE_DIR);
  if (!localLFS.util().exists(nmAuxDir)) {
try {
  localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true);
} catch (IOException ex) {
  throw new YarnRuntimeException("Fail to create dir:"
  + nmAuxDir.toString(), ex);
}
  }
  Path src = new Path(remoteFile);
  FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf);
  FileStatus scFileStatus = remoteLFS.getFileStatus(src);
  if (!scFileStatus.getOwner().equals(
  this.userUGI.getShortUserName())) {
throw new YarnRuntimeException("The remote jarfile owner:"
+ scFileStatus.getOwner() + " is not the same as the NM user:"
+ this.userUGI.getShortUserName() + ".");
  }
  if ((scFileStatus.getPermission().toShort() & 0022) != 0) {
throw new YarnRuntimeException("The remote jarfile should not "
+ "be writable by group or others. "
+ "The current Permission is "
+ scFileStatus.getPermission().toShort());
  }
{code}

According to the above reasons, I have tried to change *manifest.txt* file 
parents directories without *writeable* permission of group and others. and 
change the *umask to 022*, which will effect new created file and directories 
permissions, because the *manifest.txt* and *run-tests.jar* will be new created 
when running tests.
{code:java}
chmod go-w yourpath/hadoop/ -R
umask 022
umask
{code}
After doing above and re-run tests of *TestAuxServices*, all the tests can 
pass. 

Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of 
Hadoop or not. could you please give some suggestions?

Thanks.

 

 


was (Author: seanlau):
Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 

[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is

2019-10-11 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890
 ] 

liusheng edited comment on YARN-9511 at 10/12/19 3:51 AM:
--

Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 
*TestAuxServices*, and will cause *2 Errors 9 Failures*, see:
{code:java}
Failures:
  TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0>
  TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but 
was:<0>
  TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> 
but was:<1>
  TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but 
was:<1>
  TestAuxServices.testCustomizedAuxServiceClassPath:416
  TestAuxServices.testManualReload:919 expected:<2> but was:<0>
  TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar 
is wrong.Should throw out exception.
  TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0>
  TestAuxServices.testValidAuxServiceName:698 Should receive the exception.
Errors:
  TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement
  TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote 
jar...
{code}
After debuging, I found all these issues are directly or indirectly related 
with files permissions.

there are two situations when running these tests:
 # *useManifest* enabled, when running tests with useManifest enabled, the 
tests will check and use the manifest file: 
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code}
this file and all its parents directories must not be writable by group or 
others, see:
{code:java}
private boolean checkManifestPermissions(FileStatus status) throws
IOException {
  if ((status.getPermission().toShort() & 0022) != 0) {
LOG.error("Manifest file and parents must not be writable by group or " +
"others. The current Permission of " + status.getPath() + " is " +
status.getPermission());
return false;
  }
  Path parent = status.getPath().getParent();
  if (parent == null) {
return true;
  }
  return checkManifestPermissions(manifestFS.getFileStatus(parent));
}{code}
 

 # *useManifest not* enabled,  when running tests with useManifest enabled, 
tests will use a *test-runjar.jar* file
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar
{code}
related code for checking its permission:
{code:java}
private Path maybeDownloadJars(String sName, String className, String
remoteFile, AuxServiceFile.TypeEnum type, Configuration conf)
throws IOException {
  // load AuxiliaryService from remote classpath
  FileContext localLFS = getLocalFileContext(conf);
  // create NM aux-service dir in NM localdir if it does not exist.
  Path nmAuxDir = dirsHandler.getLocalPathForWrite("."
  + Path.SEPARATOR + NM_AUX_SERVICE_DIR);
  if (!localLFS.util().exists(nmAuxDir)) {
try {
  localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true);
} catch (IOException ex) {
  throw new YarnRuntimeException("Fail to create dir:"
  + nmAuxDir.toString(), ex);
}
  }
  Path src = new Path(remoteFile);
  FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf);
  FileStatus scFileStatus = remoteLFS.getFileStatus(src);
  if (!scFileStatus.getOwner().equals(
  this.userUGI.getShortUserName())) {
throw new YarnRuntimeException("The remote jarfile owner:"
+ scFileStatus.getOwner() + " is not the same as the NM user:"
+ this.userUGI.getShortUserName() + ".");
  }
  if ((scFileStatus.getPermission().toShort() & 0022) != 0) {
throw new YarnRuntimeException("The remote jarfile should not "
+ "be writable by group or others. "
+ "The current Permission is "
+ scFileStatus.getPermission().toShort());
  }
{code}

According to the above reasons, I have tried to change *manifest.txt* file 
parents directories without *writeable* permission of group and others. and 
change the *umask to 022*, which will effect new created file and directories 
permissions, because the *manifest.txt* and *run-tests.jar* will be new created 
when running tests.
{code:java}
chmod go-w yourpath/hadoop/ -R
umask 022
umask
{code}
After doing above and re-run tests of *TestAuxServices*, all the tests can 
pass. 

Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of 
Hadoop or not. could you please give some suggestions?

Thanks.

 

 


was (Author: seanlau):
Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 

[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is

2019-10-11 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890
 ] 

liusheng edited comment on YARN-9511 at 10/12/19 3:50 AM:
--

Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 
*TestAuxServices*, and will cause *2 Errors 9 Failures*, see:
{code:java}
Failures:
  TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0>
  TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but 
was:<0>
  TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> 
but was:<1>
  TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but 
was:<1>
  TestAuxServices.testCustomizedAuxServiceClassPath:416
  TestAuxServices.testManualReload:919 expected:<2> but was:<0>
  TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar 
is wrong.Should throw out exception.
  TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0>
  TestAuxServices.testValidAuxServiceName:698 Should receive the exception.
Errors:
  TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement
  TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote 
jar...
{code}
After debuging, I found all these issues are directly or indirectly related 
with files permissions.

there are two situations when running these tests:
 # *useManifest* enabled, when running tests with useManifest enabled, the 
tests will check and use the manifest file: 
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code}
this file and all its parents directories must not be writable by group or 
others, see:
{code:java}
private boolean checkManifestPermissions(FileStatus status) throws
IOException {
  if ((status.getPermission().toShort() & 0022) != 0) {
LOG.error("Manifest file and parents must not be writable by group or " +
"others. The current Permission of " + status.getPath() + " is " +
status.getPermission());
return false;
  }
  Path parent = status.getPath().getParent();
  if (parent == null) {
return true;
  }
  return checkManifestPermissions(manifestFS.getFileStatus(parent));
}{code}
**

 # *useManifest not* enabled,  when running tests with useManifest enabled, 
tests will use a *test-runjar.jar* file
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar
{code}
related code for checking its permission:
{code:java}
private Path maybeDownloadJars(String sName, String className, String
remoteFile, AuxServiceFile.TypeEnum type, Configuration conf)
throws IOException {
  // load AuxiliaryService from remote classpath
  FileContext localLFS = getLocalFileContext(conf);
  // create NM aux-service dir in NM localdir if it does not exist.
  Path nmAuxDir = dirsHandler.getLocalPathForWrite("."
  + Path.SEPARATOR + NM_AUX_SERVICE_DIR);
  if (!localLFS.util().exists(nmAuxDir)) {
try {
  localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true);
} catch (IOException ex) {
  throw new YarnRuntimeException("Fail to create dir:"
  + nmAuxDir.toString(), ex);
}
  }
  Path src = new Path(remoteFile);
  FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf);
  FileStatus scFileStatus = remoteLFS.getFileStatus(src);
  if (!scFileStatus.getOwner().equals(
  this.userUGI.getShortUserName())) {
throw new YarnRuntimeException("The remote jarfile owner:"
+ scFileStatus.getOwner() + " is not the same as the NM user:"
+ this.userUGI.getShortUserName() + ".");
  }
  if ((scFileStatus.getPermission().toShort() & 0022) != 0) {
throw new YarnRuntimeException("The remote jarfile should not "
+ "be writable by group or others. "
+ "The current Permission is "
+ scFileStatus.getPermission().toShort());
  }
{code}

According to the above reasons, I have tried to change *manifest.txt* file 
parents directories without *writeable* permission of group and others. and 
change the *umask to 022*, which will effect new created file and directories 
permissions, because the *manifest.txt* and *run-tests.jar* will be new created 
when running tests.
{code:java}
chmod go-w yourpath/hadoop/ -R
umask 022
umask
{code}
After doing above and re-run tests of *TestAuxServices*, all the tests can 
pass. 

Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of 
Hadoop or not. could you please give some suggestions?

Thanks.

 

 


was (Author: seanlau):
Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 

[jira] [Comment Edited] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is

2019-10-11 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890
 ] 

liusheng edited comment on YARN-9511 at 10/12/19 3:49 AM:
--

Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 
*TestAuxServices*, and will cause *2 Errors 9 Failures*, see:
{code:java}
Failures:
  TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0>
  TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but 
was:<0>
  TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> 
but was:<1>
  TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but 
was:<1>
  TestAuxServices.testCustomizedAuxServiceClassPath:416
  TestAuxServices.testManualReload:919 expected:<2> but was:<0>
  TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar 
is wrong.Should throw out exception.
  TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0>
  TestAuxServices.testValidAuxServiceName:698 Should receive the exception.
Errors:
  TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement
  TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote 
jar...
{code}
After debuging, I found all these issues are directly or indirectly related 
with files permissions.

there are two situations when running these tests:
 # *useManifest* enabled, when running tests with useManifest enabled, the 
tests will check and use the manifest file: 
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code}
this file and all its parents directories must not be writable by group or 
others, see:
{code:java}
private boolean checkManifestPermissions(FileStatus status) throws
IOException {
  if ((status.getPermission().toShort() & 0022) != 0) {
LOG.error("Manifest file and parents must not be writable by group or " +
"others. The current Permission of " + status.getPath() + " is " +
status.getPermission());
return false;
  }
  Path parent = status.getPath().getParent();
  if (parent == null) {
return true;
  }
  return checkManifestPermissions(manifestFS.getFileStatus(parent));
}{code}
**
 # *useManifest not* enabled,  when running tests with useManifest enabled, 
tests will use a *test-runjar.jar* file
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/test-runjar.jar
{code}
related code for checking its permission:
{code:java}
private Path maybeDownloadJars(String sName, String className, String
remoteFile, AuxServiceFile.TypeEnum type, Configuration conf)
throws IOException {
  // load AuxiliaryService from remote classpath
  FileContext localLFS = getLocalFileContext(conf);
  // create NM aux-service dir in NM localdir if it does not exist.
  Path nmAuxDir = dirsHandler.getLocalPathForWrite("."
  + Path.SEPARATOR + NM_AUX_SERVICE_DIR);
  if (!localLFS.util().exists(nmAuxDir)) {
try {
  localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true);
} catch (IOException ex) {
  throw new YarnRuntimeException("Fail to create dir:"
  + nmAuxDir.toString(), ex);
}
  }
  Path src = new Path(remoteFile);
  FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf);
  FileStatus scFileStatus = remoteLFS.getFileStatus(src);
  if (!scFileStatus.getOwner().equals(
  this.userUGI.getShortUserName())) {
throw new YarnRuntimeException("The remote jarfile owner:"
+ scFileStatus.getOwner() + " is not the same as the NM user:"
+ this.userUGI.getShortUserName() + ".");
  }
  if ((scFileStatus.getPermission().toShort() & 0022) != 0) {
throw new YarnRuntimeException("The remote jarfile should not "
+ "be writable by group or others. "
+ "The current Permission is "
+ scFileStatus.getPermission().toShort());
  }
{code}

According to the above reasons, I have tried to change *manifest.txt* file 
parents directories without *writeable* permission of group and others. and 
change the *umask to 077*, which will effect new created file and directories 
permissions, because the *manifest.txt* and *run-tests.jar* will be new created 
when running tests.
{code:java}
chmod go-w yourpath/hadoop/ -R
umask 022
umask
{code}
After doing above and re-run tests of *TestAuxServices*, all the tests can 
pass. 

Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of 
Hadoop or not. could you please give some suggestions?

Thanks.

 

 


was (Author: seanlau):
Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 

[jira] [Commented] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436

2019-10-11 Thread liusheng (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949890#comment-16949890
 ] 

liusheng commented on YARN-9511:


Hi [~snemeth] [~adam.antal], 

Thank you both for care about this issue. I have take some time tried to find 
the reason of this issue.  This issue will effect the tests of 
*TestAuxServices*, and will cause *2 Errors 9 Failures*, see:
{code:java}
Failures:
  TestAuxServices.testAuxServiceRecoverySetup:717 expected:<2> but was:<0>
  TestAuxServices.testAuxServicesManifestPermissions:874 expected:<2> but 
was:<0>
  TestAuxServices.testAuxServicesMeta:638 Invalid mix of services expected:<6> 
but was:<1>
  TestAuxServices.testAuxServices:610 Invalid mix of services expected:<6> but 
was:<1>
  TestAuxServices.testCustomizedAuxServiceClassPath:416
  TestAuxServices.testManualReload:919 expected:<2> but was:<0>
  TestAuxServices.testRemoteAuxServiceClassPath:313 The permission of the jar 
is wrong.Should throw out exception.
  TestAuxServices.testRemoveManifest:897 expected:<2> but was:<0>
  TestAuxServices.testValidAuxServiceName:698 Should receive the exception.
Errors:
  TestAuxServices.testAuxUnexpectedStop:664 » NoSuchElement
  TestAuxServices.testRemoteAuxServiceClassPath:334 » YarnRuntime The remote 
jar...
{code}
After debuging, I found all these issues are directly or indirectly related 
with files permissions.

there are two situations when running these tests:
 # *useManifest* enabled, when running tests with useManifest enabled, the 
tests will check and use the manifest file: 
{code:java}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/test-dir/TestAuxServices/manifest.txt{code}
this file and all its parents directories must not be writable by group or 
others, see:
{code:java}
private boolean checkManifestPermissions(FileStatus status) throws
IOException {
  if ((status.getPermission().toShort() & 0022) != 0) {
LOG.error("Manifest file and parents must not be writable by group or " +
"others. The current Permission of " + status.getPath() + " is " +
status.getPermission());
return false;
  }
  Path parent = status.getPath().getParent();
  if (parent == null) {
return true;
  }
  return checkManifestPermissions(manifestFS.getFileStatus(parent));
}{code}

 # *useManifest not* enabled,  when running tests with useManifest enabled, 
tests will use a test-jar.jar file
{code:java}
private Path maybeDownloadJars(String sName, String className, String
remoteFile, AuxServiceFile.TypeEnum type, Configuration conf)
throws IOException {
  // load AuxiliaryService from remote classpath
  FileContext localLFS = getLocalFileContext(conf);
  // create NM aux-service dir in NM localdir if it does not exist.
  Path nmAuxDir = dirsHandler.getLocalPathForWrite("."
  + Path.SEPARATOR + NM_AUX_SERVICE_DIR);
  if (!localLFS.util().exists(nmAuxDir)) {
try {
  localLFS.mkdir(nmAuxDir, NM_AUX_SERVICE_DIR_PERM, true);
} catch (IOException ex) {
  throw new YarnRuntimeException("Fail to create dir:"
  + nmAuxDir.toString(), ex);
}
  }
  Path src = new Path(remoteFile);
  FileContext remoteLFS = getRemoteFileContext(src.toUri(), conf);
  FileStatus scFileStatus = remoteLFS.getFileStatus(src);
  if (!scFileStatus.getOwner().equals(
  this.userUGI.getShortUserName())) {
throw new YarnRuntimeException("The remote jarfile owner:"
+ scFileStatus.getOwner() + " is not the same as the NM user:"
+ this.userUGI.getShortUserName() + ".");
  }
  if ((scFileStatus.getPermission().toShort() & 0022) != 0) {
throw new YarnRuntimeException("The remote jarfile should not "
+ "be writable by group or others. "
+ "The current Permission is "
+ scFileStatus.getPermission().toShort());
  }
{code}

 

 

According to the above reasons, I have tried to change *manifest.txt* file 
parents directories without *writeable* permission of group and others. and 
change the *umask to 077*, which will effect new created file and directories 
permissions, because the *manifest.txt* and *run-tests.jar* will be new created 
when running tests.

 
{code:java}
chmod go-w yourpath/hadoop/ -R
umask 022
umask
{code}
After doing above and re-run tests of *TestAuxServices*, all the tests can 
pass. 

Actually, I am a new comer to Hadoop, so I am not sure whether this is a bug of 
Hadoop or not. could you please give some suggestions?

Thanks.

 

 

> [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: 
> The remote jarfile should not be writable by group or others. The current 
> Permission is 436
> ---
>
> Key: YARN-9511
> URL: 

[jira] [Commented] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949842#comment-16949842
 ] 

Hadoop QA commented on YARN-9894:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 12 new + 6 unchanged - 0 fixed = 18 total (was 6) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9894 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982821/YARN-9894.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f3d44dadf48e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c561a70 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24967/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 

[jira] [Commented] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949822#comment-16949822
 ] 

Hadoop QA commented on YARN-9884:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9884 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982818/YARN-9884.003.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 9bcf0ce498ac 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c561a70 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24968/testReport/ |
| Max. process+thread count | 307 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24968/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch, 
> YARN-9884.003.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-9894:
-
Attachment: YARN-9894.001.patch

> CapacitySchedulerPerf test for measuring hundreds of apps in a large number 
> of queues.
> --
>
> Key: YARN-9894
> URL: https://issues.apache.org/jira/browse/YARN-9894
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, test
>Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-9894.001.patch
>
>
> I have developed a unit test based on the existing TestCapacitySchedulerPerf 
> tests that will measure the performance of a configurable number of apps in a 
> configurable number of queues. It will also test the performance of a cluster 
> that has many queues but only a portion of them are active.
> {code:title=For example:}
> $ mvn test 
> -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
>   -DRunCapacitySchedulerPerfTests=true
>   -DNumberOfQueues=100 \
>   -DNumberOfApplications=200 \
>   -DPercentActiveQueues=100
> {code}
> - Parameters:
> -- RunCapacitySchedulerPerfTests=true:
> Needed in order to trigger the test
> -- NumberOfQueues
> Configurable number of queues
> -- NumberOfApplications
> Total number of apps to run in the whole cluster, distributed evenly across 
> all queues
> -- PercentActiveQueues
> Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-9894:
-
Attachment: YARN-9894.001.patch

> CapacitySchedulerPerf test for measuring hundreds of apps in a large number 
> of queues.
> --
>
> Key: YARN-9894
> URL: https://issues.apache.org/jira/browse/YARN-9894
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, test
>Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> I have developed a unit test based on the existing TestCapacitySchedulerPerf 
> tests that will measure the performance of a configurable number of apps in a 
> configurable number of queues. It will also test the performance of a cluster 
> that has many queues but only a portion of them are active.
> {code:title=For example:}
> $ mvn test 
> -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
>   -DRunCapacitySchedulerPerfTests=true
>   -DNumberOfQueues=100 \
>   -DNumberOfApplications=200 \
>   -DPercentActiveQueues=100
> {code}
> - Parameters:
> -- RunCapacitySchedulerPerfTests=true:
> Needed in order to trigger the test
> -- NumberOfQueues
> Configurable number of queues
> -- NumberOfApplications
> Total number of apps to run in the whole cluster, distributed evenly across 
> all queues
> -- PercentActiveQueues
> Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-9894:
-
Attachment: (was: YARN-9894.001.patch)

> CapacitySchedulerPerf test for measuring hundreds of apps in a large number 
> of queues.
> --
>
> Key: YARN-9894
> URL: https://issues.apache.org/jira/browse/YARN-9894
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, test
>Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> I have developed a unit test based on the existing TestCapacitySchedulerPerf 
> tests that will measure the performance of a configurable number of apps in a 
> configurable number of queues. It will also test the performance of a cluster 
> that has many queues but only a portion of them are active.
> {code:title=For example:}
> $ mvn test 
> -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
>   -DRunCapacitySchedulerPerfTests=true
>   -DNumberOfQueues=100 \
>   -DNumberOfApplications=200 \
>   -DPercentActiveQueues=100
> {code}
> - Parameters:
> -- RunCapacitySchedulerPerfTests=true:
> Needed in order to trigger the test
> -- NumberOfQueues
> Configurable number of queues
> -- NumberOfApplications
> Total number of apps to run in the whole cluster, distributed evenly across 
> all queues
> -- PercentActiveQueues
> Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949790#comment-16949790
 ] 

Eric Badger commented on YARN-9884:
---

Thanks for the prompt reviews, everyone! Patch 003 combines the util and docker 
enum error code lists together into a single one in util.h. I left the old 
error codes that aren't used anymore to keep the diff smaller, but I could 
remove those if you're in favor. We still have about 50 error codes that we can 
add until we go over the 128 boundary. 

> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch, 
> YARN-9884.003.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9884:
--
Attachment: YARN-9884.003.patch

> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch, 
> YARN-9884.003.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize

2019-10-11 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949769#comment-16949769
 ] 

David Mollitor commented on YARN-9863:
--

[~szegedim] Thank you for your feedback.

The background here is that I am working with a large cluster that has one job 
in particular that is crushing it. This one job is required to localize many 
resources, of varying file sizes, for the job to complete. As I understand 
YARN, when a job is submitted to the cluster, a list of files to localize is 
sent to each NodeManager involved in the job. In this case, all nodes are 
involved. All NodeManagers receive a carbon copy of the list of files from the 
ResourceManager (or maybe it's the 'yarn' client?). That is, they all have the 
same list, with the same ordering. The NodeManager then iterate through the 
list and request that each file be localized.

So, it would seem to me that all of the NodeManagers would request from HDFS 
file1, file2, file3, ...

This would have a stampeding affect on the HDFS DataNodes.

I am familiar with {{mapreduce.client.submit.file.replication}}. I understand 
that this is used to pump-up the replication of the submitted files so that 
they are available on more DataNodes. However, the way that it works, as I 
understand it, is that the file is first written to the HDFS cluster with the 
default replication (usually 3), and then the client requests that the file be 
replicated up to the final size in a separate request (setrep). This 
replication process happens asynchronously. If the 
{{mapreduce.client.submit.file.replication}} is set to 10, for example, the job 
may be submitted and finished before the file actually achieves a final 
replication of 10. This becomes exacerbated on larger clusters. If a cluster 
has 1,000 nodes, the recommended value of 
{{mapreduce.client.submit.file.replication}} is sqrt(1000) or ~32. The default 
number of connections each DataNode can support is 10 
({{dfs.datanode.handler.count}}). So, even if the desired replication is 
achieved, that is 32 x 10 connections = 320 connections supported at once. In a 
cluster with 1,000 nodes, that is going to stall.

By simply randomizing the list, the load can be spread across many sets of 32 
nodes and better support this scenario.

For your questions:
 # I'm not sure how HDFS would manage this. The requests are generated by the 
NodeManagers and the HDFS cluster is simply serving. They have no way to 
randomize the requests.
 # SecureRandom. This is not a secure operation. It only requires a fast and 
pretty-good randomization of the list to spread the load
 # I believe that the parallel nature of the localization is configurable with 
{{yarn.nodemanager.localizer.fetch.thread-count}} (default 4), but I believe 
that the requests are submitted to a work-queue in order, so there will still 
be some level of trampling, especially if there are more than 4 files to 
localize (as is this case with the scenario I am reviewing)

> Randomize List of Resources to Localize
> ---
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize

2019-10-11 Thread Miklos Szegedi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949707#comment-16949707
 ] 

Miklos Szegedi commented on YARN-9863:
--

[~belugabehr], could you explain the motivation for this change a bit more?

AFAIK the order is better to be decided in HDFS. Also, once you use a random 
number, why do not you use SecureRandom? My third question is whether 
localization is running in parallel in which case the order does not matter so 
much.

All in all my experience with YARN and localization suggests that if you have a 
bottleneck on HDFS, you would rather just do a suitable replica increase in 
HDFS even if it is temporary. HDFS is much better in doing replicas for 
localization, since it can do streaming avoiding any bottlenecks. Then the 
localization goes to the local instance, making it practically painless.

> Randomize List of Resources to Localize
> ---
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9863) Randomize List of Resources to Localize

2019-10-11 Thread Miklos Szegedi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949707#comment-16949707
 ] 

Miklos Szegedi edited comment on YARN-9863 at 10/11/19 6:39 PM:


[~belugabehr], thank you for the patch. Could you explain the motivation for 
this change a bit more?

AFAIK the order is better to be decided in HDFS. Also, once you use a random 
number, why do not you use SecureRandom? My third question is whether 
localization is running in parallel in which case the order does not matter so 
much.

All in all my experience with YARN and localization suggests that if you have a 
bottleneck on HDFS, you would rather just do a suitable replica increase in 
HDFS even if it is temporary. HDFS is much better in doing replicas for 
localization, since it can do streaming avoiding any bottlenecks. Then the 
localization goes to the local instance, making it practically painless.


was (Author: szegedim):
[~belugabehr], could you explain the motivation for this change a bit more?

AFAIK the order is better to be decided in HDFS. Also, once you use a random 
number, why do not you use SecureRandom? My third question is whether 
localization is running in parallel in which case the order does not matter so 
much.

All in all my experience with YARN and localization suggests that if you have a 
bottleneck on HDFS, you would rather just do a suitable replica increase in 
HDFS even if it is temporary. HDFS is much better in doing replicas for 
localization, since it can do streaming avoiding any bottlenecks. Then the 
localization goes to the local instance, making it practically painless.

> Randomize List of Resources to Localize
> ---
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Craig Condit (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949703#comment-16949703
 ] 

Craig Condit commented on YARN-9884:


[~ebadger], this looks pretty good for the most part. However, there's one 
potential big problem... exit codes outside of the range 0-127 tend to be 
misinterpreted by shells and other tooling. Have we verified that the upper 
codes are being interpreted properly? Most kernel wait() function variations 
truncate exit codes to 8 bits, and shells treat them as signed, where negative 
values indicate death by signal -- the infamous 143 exit code in Hadoop is 
really SIGTERM (15) + 128.

> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949691#comment-16949691
 ] 

Jim Brennan commented on YARN-9884:
---

[~ebadger] good job on the re-factoring.  This looks pretty good to me.  I was 
going to comment that there are a few of the DOCKER related enum values that 
are no longer used, like INVALID_DOCKER_RO_MOUNT, and those should be removed.  
Also, I think all DOCKER-specific codes should have DOCKER in the name.

I agree with [~eyang] that a single list would be even better.

 

 

> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user

2019-10-11 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949690#comment-16949690
 ] 

Eric Payne commented on YARN-9887:
--

The Capacity Scheduler does have a concept of {color:#22}Max Applications 
Per User{color} (per queue). While it is not directly configurable, it is based 
on the following:

(int)(maxApplications * (userLimit / 100.0f) * userLimitFactor)

Each of the above are configurable per queue.

> Capacity scheduler: add support for limiting maxRunningApps per user
> 
>
> Key: YARN-9887
> URL: https://issues.apache.org/jira/browse/YARN-9887
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Priority: Major
>
> Fair Scheduler supports limiting the number of applications that a particular 
> user can submit:
> {noformat}
> 
>   10
> 
> {noformat}
> Capacity Scheduler does not have an exact equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9885) Container allocation when queue usage is below MIN guarantees

2019-10-11 Thread Prashant Golash (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Golash reassigned YARN-9885:
-

Assignee: Prashant Golash

> Container allocation when queue usage is below MIN guarantees
> -
>
> Key: YARN-9885
> URL: https://issues.apache.org/jira/browse/YARN-9885
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Minor
>
> Filing this JIRA to calculate the time spend in container allocation when 
> queue usage is below min (during the whole time for the container).
> Customers generally ask YARN SLA for container allocation when their queue 
> usage is below min. I have some implementation in my mind but I want to 
> confirm if from the community if this would be a helpful feature or if this 
> is already implemented?
>  
> cc [~leftnoteasy]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9895) Feature flag to Disable delay scheduling

2019-10-11 Thread Prashant Golash (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Golash reassigned YARN-9895:
-

Assignee: Prashant Golash

> Feature flag to Disable delay scheduling
> 
>
> Key: YARN-9895
> URL: https://issues.apache.org/jira/browse/YARN-9895
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
>
> In many YARN clusters, there is no colocation of storage and compute. In such 
> cases, we may not need delay scheduling.
>  
> I think it would be good to provide an option to disable delay scheduling and 
> accordingly change the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9895) Feature flag to Disable delay scheduling

2019-10-11 Thread Prashant Golash (Jira)
Prashant Golash created YARN-9895:
-

 Summary: Feature flag to Disable delay scheduling
 Key: YARN-9895
 URL: https://issues.apache.org/jira/browse/YARN-9895
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Prashant Golash


In many YARN clusters, there is no colocation of storage and compute. In such 
cases, we may not need delay scheduling.

 

I think it would be good to provide an option to disable delay scheduling and 
accordingly change the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-9894:


Assignee: Eric Payne  (was: Eric Payne)

> CapacitySchedulerPerf test for measuring hundreds of apps in a large number 
> of queues.
> --
>
> Key: YARN-9894
> URL: https://issues.apache.org/jira/browse/YARN-9894
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, test
>Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> I have developed a unit test based on the existing TestCapacitySchedulerPerf 
> tests that will measure the performance of a configurable number of apps in a 
> configurable number of queues. It will also test the performance of a cluster 
> that has many queues but only a portion of them are active.
> {code:title=For example:}
> $ mvn test 
> -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
>   -DRunCapacitySchedulerPerfTests=true
>   -DNumberOfQueues=100 \
>   -DNumberOfApplications=200 \
>   -DPercentActiveQueues=100
> {code}
> - Parameters:
> -- RunCapacitySchedulerPerfTests=true:
> Needed in order to trigger the test
> -- NumberOfQueues
> Configurable number of queues
> -- NumberOfApplications
> Total number of apps to run in the whole cluster, distributed evenly across 
> all queues
> -- PercentActiveQueues
> Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9014) runC container runtime

2019-10-11 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9014:
--
Summary: runC container runtime  (was: OCI/squashfs container runtime)

> runC container runtime
> --
>
> Key: YARN-9014
> URL: https://issues.apache.org/jira/browse/YARN-9014
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Jason Darrell Lowe
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: OciSquashfsRuntime.v001.pdf, 
> RuncContainerRuntime.v002.pdf
>
>
> This JIRA tracks a YARN container runtime that supports running containers in 
> images built by Docker but the runtime does not use Docker directly, and 
> Docker does not have to be installed on the nodes.  The runtime leverages the 
> [OCI runtime standard|https://github.com/opencontainers/runtime-spec] to 
> launch containers, so an OCI-compliant runtime like {{runc}} is required.  
> {{runc}} has the benefit of not requiring a daemon like {{dockerd}} to be 
> running in order to launch/control containers.
> The layers comprising the Docker image are uploaded to HDFS as 
> [squashfs|http://tldp.org/HOWTO/SquashFS-HOWTO/whatis.html] images, enabling 
> the runtime to efficiently download and execute directly on the compressed 
> layers.  This saves image unpack time and space on the local disk.  The image 
> layers, like other entries in the YARN distributed cache, can be spread 
> across the YARN local disks, increasing the available space for storing 
> container images on each node.
> A design document will be posted shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-9894:


Assignee: Eric Payne

> CapacitySchedulerPerf test for measuring hundreds of apps in a large number 
> of queues.
> --
>
> Key: YARN-9894
> URL: https://issues.apache.org/jira/browse/YARN-9894
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, test
>Affects Versions: 2.9.2, 2.8.5, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> I have developed a unit test based on the existing TestCapacitySchedulerPerf 
> tests that will measure the performance of a configurable number of apps in a 
> configurable number of queues. It will also test the performance of a cluster 
> that has many queues but only a portion of them are active.
> {code:title=For example:}
> $ mvn test 
> -Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
>   -DRunCapacitySchedulerPerfTests=true
>   -DNumberOfQueues=100 \
>   -DNumberOfApplications=200 \
>   -DPercentActiveQueues=100
> {code}
> - Parameters:
> -- RunCapacitySchedulerPerfTests=true:
> Needed in order to trigger the test
> -- NumberOfQueues
> Configurable number of queues
> -- NumberOfApplications
> Total number of apps to run in the whole cluster, distributed evenly across 
> all queues
> -- PercentActiveQueues
> Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)
Eric Payne created YARN-9894:


 Summary: CapacitySchedulerPerf test for measuring hundreds of apps 
in a large number of queues.
 Key: YARN-9894
 URL: https://issues.apache.org/jira/browse/YARN-9894
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, test
Affects Versions: 3.1.3, 3.2.1, 2.8.5, 2.9.2
Reporter: Eric Payne


I have developed a unit test based on the existing TestCapacitySchedulerPerf 
tests that will measure the performance of a configurable number of apps in a 
configurable number of queues. It will also test the performance of a cluster 
that has many queues but only a portion of them are active.

{code:title=For example:}
$ mvn test 
-Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
  -DRunCapacitySchedulerPerfTests=true
  -DNumberOfQueues=100 \
  -DNumberOfApplications=200 \
  -DPercentActiveQueues=100
{code}

- Parameters:
-- RunCapacitySchedulerPerfTests=true:
Needed in order to trigger the test
-- NumberOfQueues
Configurable number of queues
-- NumberOfApplications
Total number of apps to run in the whole cluster, distributed evenly across all 
queues
-- PercentActiveQueues
Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949576#comment-16949576
 ] 

Hadoop QA commented on YARN-8453:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
22s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
54s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-8453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 587486dec1d1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 626a48d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24966/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24966/testReport/ |
| Max. process+thread count | 761 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9882) QueueMetrics not coming in Capacity Scheduler with Node Label Configuration

2019-10-11 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949573#comment-16949573
 ] 

Eric Payne commented on YARN-9882:
--

[~gaurav.suman], for the sake of legacy, the metrics outside of the sections 
labelled "...ByPartition" only reflect the resource usage of the default 
partition. For each partition, these metrics are included in the 
"...ByPartition" sections. If one wants the sum of all resources in all 
partitions, it is necessary to sum the metrics for each partition.

The history of this is in YARN-6467 and others referenced there.

There are currently problems with the accuracy of all of these metrics. They 
are being worked by [~rmanikandan] in the following JIRAs:
 YARN-6492
 YARN-9767
 YARN-9773

> QueueMetrics not coming in Capacity Scheduler with Node Label Configuration
> ---
>
> Key: YARN-9882
> URL: https://issues.apache.org/jira/browse/YARN-9882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, metrics, scheduler
>Reporter: Gaurav Suman
>Priority: Major
>
> I am having a capacity scheduler setup with two queues - "low-priority", 
> "regular-priority". There are two node-labels "low" and "regular". 
> low-priority queue has 100% access to "low" node-label and regular-priority 
> queue has 100% access to "regular" node label.
> The yarn ui capacity scheduler configuration - 
> [https://i.stack.imgur.com/gOARn.png]
> When i see the QueueMetrics emitted by queue "low-priority" and 
> "regular-priority" in (http://rm-ip:port/jmx), then it shows correct values 
> of availableMB and availableVCores, pendingMB=0 etc. but when I submit a job 
> to any queue, there is no update in jmx metrics like pendingMB, 
> pendingVcores, availableMB, availableVCores etc. only AppsRunning, 
> ActiveApplications etc. are getting updated. The pendingMB, pendingVcores 
> remains always 0 and there is no changes in availableMB, availableVcores, 
> appsRunning and activeApplications shows correct value as 1. Not able to find 
> why the metrics is not getting updated after job submission.
> The issue comes only when node-label is enabled. When node-label is disabled 
> and only queue is used everything works fine.
> The capacity scheduler configuration(capacity-scheduler.xml):
> {code:java}
> 
> 
> yarn.scheduler.capacity.maximum-applications
> 5000
> 
> 
> yarn.scheduler.capacity.maximum-am-resource-percent
> 0.2
> 
> 
> yarn.scheduler.capacity.resource-calculator
> 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
> 
> The ResourceCalculator implementation to be used to compare
> Resources in the scheduler.
> The default i.e. DefaultResourceCalculator only uses Memory while
> DominantResourceCalculator uses dominant-resource to compare
> multi-dimensional resources such as Memory, CPU etc.
> 
> 
> 
> yarn.scheduler.capacity.root.queues
> low-priority,regular-priority
> 
> 
> yarn.scheduler.capacity.root.capacity
> 100
> 
> 
> yarn.scheduler.capacity.root.maximum-capacity
> 100
> 
> 
> yarn.scheduler.capacity.root.accessible-node-labels
> *
> 
> 
> 
> yarn.scheduler.capacity.root.accessible-node-labels.regular.capacity
> 100
> 
> 
> 
> yarn.scheduler.capacity.root.accessible-node-labels.regular.maximum-capacity
> 100
> 
> 
> 
> yarn.scheduler.capacity.root.accessible-node-labels.low.capacity
> 100
> 
> 
> 
> yarn.scheduler.capacity.root.accessible-node-labels.low.maximum-capacity
> 100
> 
> 
> yarn.scheduler.capacity.root.default.state
> RUNNING
> 
> The state of the default queue. State can be one of RUNNING or 
> STOPPED.
> 
> 
> 
> yarn.scheduler.capacity.root.default.acl_submit_applications
> *
> 
> 
> yarn.scheduler.capacity.root.default.acl_administer_queue
> *
> 
> The ACL of who can administer jobs on the default queue.
> 
> 
> 
> yarn.scheduler.capacity.node-locality-delay
> 40
> 
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> 
> 
> yarn.scheduler.capacity.root.low-priority.capacity
> 50
> 
> 
> yarn.scheduler.capacity.root.low-priority.maximum-capacity
> 100
> 
> 
> yarn.scheduler.capacity.root.low-priority.ordering-policy
> fair
> 
> 
> 
> yarn.scheduler.capacity.root.low-priority.accessible-node-labels
> low
> 
> 
> 
> yarn.scheduler.capacity.root.low-priority.default-node-label-expression
> low
> 
> 
> 
> yarn.scheduler.capacity.root.low-priority.accessible-node-labels.low.capacity
> 100
> 
> 
> 
> yarn.scheduler.capacity.root.low-priority.accessible-node-labels.low.maximum-capacity
> 100
> 
> 
> 

[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949570#comment-16949570
 ] 

Hadoop QA commented on YARN-5106:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 27 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 11 new + 278 unchanged - 52 fixed = 289 total (was 330) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
31s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
21s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}211m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-5106 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982771/YARN-5106.012.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d45762b6b02e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ec86f42 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949546#comment-16949546
 ] 

Hadoop QA commented on YARN-8453:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
10s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
39s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-8453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6baa1b98f753 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 626a48d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24965/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24965/testReport/ |
| Max. process+thread count | 781 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949524#comment-16949524
 ] 

Hadoop QA commented on YARN-8453:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
13s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 3 unchanged - 0 fixed = 9 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 
21s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.3 Server=19.03.3 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-8453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12982777/YARN-8453.branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6779bbc1f636 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / 626a48d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24964/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24964/testReport/ |
| Max. process+thread count | 789 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9884) Make container-executor mount logic modular

2019-10-11 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949497#comment-16949497
 ] 

Eric Badger commented on YARN-9884:
---

bq. I am wondering if we need to combine the list to prevent future conflicts 
of the same name?

I am fine with doing that. I refactored the error codes in a way that I thought 
would be the least invasive to how docker currently handles error codes. But 
having multiple error code lists is a little bit confusing because there could 
be collisions with error values and you won't always know whether your error 
came from a generic error code enum or a docker-specific one. Something I ran 
into while testing was that the docker daemon will return codes in the 100 
range. I had initially placed all of the docker error codes in the 100 range, 
but then got an error 127 from the docker daemon and realized that that would 
be incorrectly parsed as a docker error code. So even though the docker daemon 
was passing back 127, the error message you would be given would be something 
completely unrelated to the actual error.

So all in all, I am in favor of combining the lists. I'd be happy to include 
that in this patch or put up a separate patch to do it.

> Make container-executor mount logic modular
> ---
>
> Key: YARN-9884
> URL: https://issues.apache.org/jira/browse/YARN-9884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9884.001.patch, YARN-9884.002.patch
>
>
> The current mount logic in the container-executor is interwined with docker. 
> To avoid duplicating code between docker and runc, the code should be 
> refactored so that both runtimes can use the same common code when possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949452#comment-16949452
 ] 

Szilard Nemeth commented on YARN-9840:
--

Hi [~pbacsko]!

That's fine, thanks for the answer.

Do you agree that we need addition to the CS documentation as well? 

Thanks!

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch, 
> YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-11 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949450#comment-16949450
 ] 

Peter Bacsko commented on YARN-9840:


[~snemeth] although you didn't ask me, I examined the group mapping code in 
more detail, the answer is yes: index 0 is the primary group.

Basically it's coming from the output of {{id -GN}} if 
{{ShellBasedUnixGroupsMapping}} if used, so the first group is always the 
primary.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch, 
> YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config

2019-10-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949447#comment-16949447
 ] 

Hadoop QA commented on YARN-9699:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 28m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 137 unchanged - 7 fixed = 141 total (was 144) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
31s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} xml {color} | {color:red}  0m 10s{color} | 
{color:red} The patch has 2 ill-formed XML file(s). {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 
13s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| XML | Parsing Error(s): |
|   | 

[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949444#comment-16949444
 ] 

Szilard Nemeth commented on YARN-9840:
--

Hi [~maniraj...@gmail.com]!

Just a quick question: 

In the added code in UserGroupMappingPlacementRule#getPlacementForUser: You are 
starting the loop from index 1. I guess ithis is because the primary group is 
at the 0th index and all the seconday groups are from higher indices. Is this 
statement true? Can you make this straightforward with some code comment? 

As this is kind of a new feature in CS, can you modify the documentation of CS 
as well? 

 

Thanks!

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch, 
> YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9893) Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity setting

2019-10-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9893:
--

 Summary: Capacity scheduler: enhance leaf-queue-template capacity 
/ maximum-capacity setting
 Key: YARN-9893
 URL: https://issues.apache.org/jira/browse/YARN-9893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Peter Bacsko


Capacity Scheduler does not support two percentage values for leaf queue 
capacity and maximum-capacity settings. So, you can't do something like this:

{{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=50.0%,
 vcores=50.0%}}

On top of that, it's not even possible to define absolute resources:

{{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=16384,
 vcores=8}}

Only a single percentage value is accepted.

This makes it nearly impossible to properly convert a similar setting from Fair 
Scheduler, where such a configuration is valid and accepted 
({{}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949412#comment-16949412
 ] 

Hudson commented on YARN-8453:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17525 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17525/])
YARN-8453. Additional Unit tests to verify queue limit and max-limit (snemeth: 
rev ec86f42e40ec57ea5d515c1207161fcaf2c770e1)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerWithMultiResourceTypes.java


> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch, 
> YARN-8453.branch-3.1.001.patch, YARN-8453.branch-3.2.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8453:
-
Attachment: YARN-8453.branch-3.1.001.patch

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch, 
> YARN-8453.branch-3.1.001.patch, YARN-8453.branch-3.2.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.

2019-10-11 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9881:

 Attachment: 1.png
 2.png
 3.png
Description: 
The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
change Item memory‘s datatype from int to long.
1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
2. change Fair Scheduler API's item [memory]'s dataType from int to long.
Summary:  Change Cluster_Scheduler_API's Item memory‘s datatype from 
int to long.  (was: In YARN ui2 attempts tab, The running Application Attempt's 
Container's ElapsedTime is incorrect.)

>  Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
> 
>
> Key: YARN-9881
> URL: https://issues.apache.org/jira/browse/YARN-9881
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs, documentation, yarn
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
> Attachments: 1.png, 2.png, 3.png
>
>
> The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
> In 
> hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
> change Item memory‘s datatype from int to long.
> 1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
> 2. change Fair Scheduler API's item [memory]'s dataType from int to long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949399#comment-16949399
 ] 

Szilard Nemeth commented on YARN-8453:
--

Hi [~adam.antal]!

Thanks for the patch, just committed to trunk!

Added branch-3.2 patch as it applied cleanly and waiting for jenkins to pick it 
up.

After that, I will add branch-3.1 patch as well!

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch, 
> YARN-8453.branch-3.2.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9892) Capacity scheduler: support DRF ordering policy on queue level

2019-10-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9892:
--

 Summary: Capacity scheduler: support DRF ordering policy on queue 
level
 Key: YARN-9892
 URL: https://issues.apache.org/jira/browse/YARN-9892
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Peter Bacsko


Capacity scheduler does not support DRF (Dominant Resource Fairness) ordering 
policy on queue level. Only "fifo" and "fair" are accepted for 
{{yarn.scheduler.capacity..ordering-policy}}.

DRF can only be used globally if 
{{yarn.scheduler.capacity.resource-calculator}} is set to 
DominantResourceCalculator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8453:
-
Attachment: YARN-8453.branch-3.2.001.patch

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch, 
> YARN-8453.branch-3.2.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9888) Capacity scheduler: add support for default maxRunningApps limit per user

2019-10-11 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9888:
---
Component/s: capacity scheduler

> Capacity scheduler: add support for default maxRunningApps limit per user
> -
>
> Key: YARN-9888
> URL: https://issues.apache.org/jira/browse/YARN-9888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Priority: Major
>
> Fair scheduler has the setting {{}} which limits how many 
> running applications each user can have. 
> Capacity scheduler lacks this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user

2019-10-11 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9887:
---
Component/s: capacity scheduler

> Capacity scheduler: add support for limiting maxRunningApps per user
> 
>
> Key: YARN-9887
> URL: https://issues.apache.org/jira/browse/YARN-9887
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Priority: Major
>
> Fair Scheduler supports limiting the number of applications that a particular 
> user can submit:
> {noformat}
> 
>   10
> 
> {noformat}
> Capacity Scheduler does not have an exact equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8453:
-
Fix Version/s: 3.3.0

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9891) Capacity scheduler: enhance capacity / maximum-capacity setting

2019-10-11 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9891:
---
Component/s: capacity scheduler

> Capacity scheduler: enhance capacity / maximum-capacity setting
> ---
>
> Key: YARN-9891
> URL: https://issues.apache.org/jira/browse/YARN-9891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Priority: Major
>
> Capacity Scheduler does not support two percentage values for capacity and 
> maximum-capacity settings. So, you can't do something like this:
> {{yarn.scheduler.capacity.root.users.john.maximum-capacity=memory-mb=50.0%, 
> vcores=50.0%}}
> It's possible to use absolute resources, but not two separate percentages 
> (which expresses capacity as a percentage of the overall cluster resource). 
> Such a configuration is accepted in Fair Scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9891) Capacity scheduler: enhance capacity / maximum-capacity setting

2019-10-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9891:
--

 Summary: Capacity scheduler: enhance capacity / maximum-capacity 
setting
 Key: YARN-9891
 URL: https://issues.apache.org/jira/browse/YARN-9891
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko


Capacity Scheduler does not support two percentage values for capacity and 
maximum-capacity settings. So, you can't do something like this:

{{yarn.scheduler.capacity.root.users.john.maximum-capacity=memory-mb=50.0%, 
vcores=50.0%}}

It's possible to use absolute resources, but not two separate percentages 
(which expresses capacity as a percentage of the overall cluster resource). 
Such a configuration is accepted in Fair Scheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9881) In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.

2019-10-11 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9881:

Description: (was: The Yarn Rest 
[http://rm-http-address:port/ws/v1/cluster/scheduler] document, In 
hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
change Item memory‘s datatype from int to long.
 1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
 2. change Fair Scheduler API's item [memory]'s dataType from int to long.)
Summary: In YARN ui2 attempts tab, The running Application Attempt's 
Container's ElapsedTime is incorrect.  (was: Change Cluster_Scheduler_API's 
Item memory‘s datatype from int to long.)

> In YARN ui2 attempts tab, The running Application Attempt's Container's 
> ElapsedTime is incorrect.
> -
>
> Key: YARN-9881
> URL: https://issues.apache.org/jira/browse/YARN-9881
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs, documentation, yarn
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html

2019-10-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949398#comment-16949398
 ] 

Hudson commented on YARN-9836:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17524 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17524/])
YARN-9836. General usability improvements in showSimulationTrace.html. 
(snemeth: rev 62b5cefaeaa9cccd8d2de8eaff75d0e32e87f54d)
* (edit) hadoop-tools/hadoop-sls/src/main/html/showSimulationTrace.html


> General usability improvements in showSimulationTrace.html
> --
>
> Key: YARN-9836
> URL: https://issues.apache.org/jira/browse/YARN-9836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9836.001.patch, YARN-9836.002.patch, 
> YARN-9836.003.patch
>
>
> There are some small usability improvements that can be made for the offline 
> analysis page (showSimulationTrace.html):
> - empty divs can be hidden until no data is displayed
> - the site can be refactored to be responsive given that bootstrap is already 
> available as third party library
> - there's no proper error handling in the site (e.g. a JSON is malformed and 
> similar cases) which is really a big problem
> - there's no indentation in the raw html file which makes supportability even 
> worse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.

2019-10-11 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9881:

Description: 
The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
change Item memory‘s datatype from int to long.
 1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
 2. change Fair Scheduler API's item [memory]'s dataType from int to long.

  was:
The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
change Item memory‘s datatype from int to long.
1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
2. change Fair Scheduler API's item [memory]'s dataType from int to long.
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png!
 
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png!
 
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png!


> Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
> ---
>
> Key: YARN-9881
> URL: https://issues.apache.org/jira/browse/YARN-9881
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs, documentation, yarn
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>
> The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
> In 
> hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
> change Item memory‘s datatype from int to long.
>  1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
>  2. change Fair Scheduler API's item [memory]'s dataType from int to long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9881) Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.

2019-10-11 Thread jenny (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jenny updated YARN-9881:

Component/s: (was: yarn-ui-v2)
 yarn
 documentation
 docs
Description: 
The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
In hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
change Item memory‘s datatype from int to long.
1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
2. change Fair Scheduler API's item [memory]'s dataType from int to long.
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png!
 
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png!
 
!file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png!
Summary: Change Cluster_Scheduler_API's Item memory‘s datatype from int 
to long.  (was: In YARN ui2 attempts tab, The running Application Attempt's 
Container's ElapsedTime is incorrect.)

> Change Cluster_Scheduler_API's Item memory‘s datatype from int to long.
> ---
>
> Key: YARN-9881
> URL: https://issues.apache.org/jira/browse/YARN-9881
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs, documentation, yarn
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>
> The Yarn Rest [http://rm-http-address:port/ws/v1/cluster/scheduler] document, 
> In 
> hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API, 
> change Item memory‘s datatype from int to long.
> 1.change Capacity Scheduler API's item [memory]'s dataType from int to long.
> 2. change Fair Scheduler API's item [memory]'s dataType from int to long.
> !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/c0b216d2328a491fa7a8cd592d0171cd/clipboard.png!
>  
> !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/a1e835acff194182a5cefaa31d2b3973/clipboard.png!
>  
> !file:///C:/Users/chenjuanni/AppData/Local/YNote/data/qq00E6BD103510CE411B97708027B61D3C/dc19ae58affd4cecb79dba3386ddc1a9/clipboard.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9836) General usability improvements in showSimulationTrace.html

2019-10-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9836:
-
Fix Version/s: 3.3.0

> General usability improvements in showSimulationTrace.html
> --
>
> Key: YARN-9836
> URL: https://issues.apache.org/jira/browse/YARN-9836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: YARN-9836.001.patch, YARN-9836.002.patch, 
> YARN-9836.003.patch
>
>
> There are some small usability improvements that can be made for the offline 
> analysis page (showSimulationTrace.html):
> - empty divs can be hidden until no data is displayed
> - the site can be refactored to be responsive given that bootstrap is already 
> available as third party library
> - there's no proper error handling in the site (e.g. a JSON is malformed and 
> similar cases) which is really a big problem
> - there's no indentation in the raw html file which makes supportability even 
> worse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9890) [UI2] Add Application tag to the app table and app detail page.

2019-10-11 Thread Kinga Marton (Jira)
Kinga Marton created YARN-9890:
--

 Summary: [UI2] Add Application tag to the app table and app detail 
page.
 Key: YARN-9890
 URL: https://issues.apache.org/jira/browse/YARN-9890
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Kinga Marton
Assignee: Kinga Marton


Right now AFAIK there is no possibility to filter the applications based on the 
application tag in the UI. Adding this new column to the app table will make 
this filtering possible as well.

>From the UI2 this information is missing from the application detail page as 
>well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2019-10-11 Thread Zoltan Siegl (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Siegl updated YARN-5106:
---
Attachment: YARN-5106.012.patch

> Provide a builder interface for FairScheduler allocations for use in tests
> --
>
> Key: YARN-5106
> URL: https://issues.apache.org/jira/browse/YARN-5106
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
> Attachments: YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.001.patch, YARN-5106-branch-3.1.001.patch, 
> YARN-5106-branch-3.1.002.patch, YARN-5106-branch-3.2.001.patch, 
> YARN-5106-branch-3.2.001.patch, YARN-5106-branch-3.2.002.patch, 
> YARN-5106.001.patch, YARN-5106.002.patch, YARN-5106.003.patch, 
> YARN-5106.004.patch, YARN-5106.005.patch, YARN-5106.006.patch, 
> YARN-5106.007.patch, YARN-5106.008.patch, YARN-5106.008.patch, 
> YARN-5106.008.patch, YARN-5106.009.patch, YARN-5106.010.patch, 
> YARN-5106.011.patch, YARN-5106.012.patch
>
>
> Most, if not all, fair scheduler tests create an allocations XML file. Having 
> a helper class that potentially uses a builder would make the tests cleaner. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949389#comment-16949389
 ] 

Szilard Nemeth commented on YARN-9836:
--

Thanks [~adam.antal] for the patch, commited to trunk!

Thanks [~shuzirra] for the review!

[~adam.antal]: Would you please check if we need these changes in branch-3.2 / 
branch-3.1 as well? 

Please especially check if we have the same set of JS dependencies. 

 

Thanks!

> General usability improvements in showSimulationTrace.html
> --
>
> Key: YARN-9836
> URL: https://issues.apache.org/jira/browse/YARN-9836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9836.001.patch, YARN-9836.002.patch, 
> YARN-9836.003.patch
>
>
> There are some small usability improvements that can be made for the offline 
> analysis page (showSimulationTrace.html):
> - empty divs can be hidden until no data is displayed
> - the site can be refactored to be responsive given that bootstrap is already 
> available as third party library
> - there's no proper error handling in the site (e.g. a JSON is malformed and 
> similar cases) which is really a big problem
> - there's no indentation in the raw html file which makes supportability even 
> worse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9889) [UI] Add Application Tag column to RM All Applications table

2019-10-11 Thread Kinga Marton (Jira)
Kinga Marton created YARN-9889:
--

 Summary: [UI] Add Application Tag column to RM All Applications 
table
 Key: YARN-9889
 URL: https://issues.apache.org/jira/browse/YARN-9889
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Kinga Marton
Assignee: Kinga Marton


Right now AFAIK there is no possibility to filter the applications based on the 
application tag in the UI. Adding this new column to the app table will make 
this filtering possible as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9888) Capacity scheduler: add support for default maxRunningApps limit per user

2019-10-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9888:
--

 Summary: Capacity scheduler: add support for default 
maxRunningApps limit per user
 Key: YARN-9888
 URL: https://issues.apache.org/jira/browse/YARN-9888
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko


Fair scheduler has the setting {{}} which limits how many 
running applications each user can have. 

Capacity scheduler lacks this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9887) Capacity scheduler: add support for limiting maxRunningApps per user

2019-10-11 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9887:
--

 Summary: Capacity scheduler: add support for limiting 
maxRunningApps per user
 Key: YARN-9887
 URL: https://issues.apache.org/jira/browse/YARN-9887
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko


Fair Scheduler supports limiting the number of applications that a particular 
user can submit:


{noformat}

  10

{noformat}

Capacity Scheduler does not have an exact equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-11 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949358#comment-16949358
 ] 

Peter Bacsko commented on YARN-9841:


+1 (non-binding) from me. 

Let's wait for YARN-9840 and then try to apply this one again.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9886) Queue mapping based on userid passed through application tag

2019-10-11 Thread Kinga Marton (Jira)
Kinga Marton created YARN-9886:
--

 Summary: Queue mapping based on userid passed through application 
tag
 Key: YARN-9886
 URL: https://issues.apache.org/jira/browse/YARN-9886
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Kinga Marton
Assignee: Kinga Marton


There are situations when the real submitting user differs from the user what 
arrives to YARN. For example in case of a Hive application when Hive 
impersonation is turned off, the hive queries will run as Hive user and the 
mapping is done based on this username. Unfortunately in this case YARN doesn't 
have any information about the real user and there are cases when the customer 
may want to map this applications to the real submitting user's queue instead 
of the Hive one.

For this cases if they would pass the username in the application tag we may 
read it and use that one during the queue mapping, if that user has rights to 
run on the real user's queue.  

[~sunilg] please correct me if I missed something.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage

2019-10-11 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949330#comment-16949330
 ] 

Tao Yang edited comment on YARN-9838 at 10/11/19 10:02 AM:
---

Thanks [~jiulongZhu] for fixing this issue. 
The patch LGTM in general,  some minor suggestions for the patch:
* check-style warnings need to be fixed, after that, you can run 
"dev-support/bin/test-patch /path/to/my.patch" to confirm.
* The indentation of updated log need to be adjusted and useless deletion of a 
blank line should be reverted in LeafQueue.
* The annotation "sync ResourceUsageByLabel ResourceUsageByUser and 
numContainer" can be removed since it seems unnecessary to add details here.
* As for UT, you can remove before-fixed block and just keep the correct 
verification.  Moreover, I think it's better to remove the method 
annotation("//YARN-9838") since we can find the source easily by git, and the 
annotation style "/\*\* \*/" often used for class or method, it's better to use 
"//" or "/\* \*/" in the method.


was (Author: tao yang):
Thanks [~jiulongZhu] for fixing this issue. 
The patch is LGTM in general,  some minor suggestions for the patch:
* check-style warnings need to be fixed, after that, you can run 
"dev-support/bin/test-patch /path/to/my.patch" to confirm.
* The indentation of updated log need to be adjusted and useless deletion of a 
blank line should be reverted in LeafQueue.
* The annotation "sync ResourceUsageByLabel ResourceUsageByUser and 
numContainer" can be removed since it seems unnecessary to add details here.
* As for UT, you can remove before-fixed block and just keep the correct 
verification.  Moreover, I think it's better to remove "//YARN-9838" since we 
can find the source easily by git, and the annotation style "/** */" often used 
for class or method, it's better to use "//" or "/* */" in the method.

> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS 
> reserved containers for,will cause "Num Container" and "Used Resource" in 
> ResourceUsage metrics error 
> --
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.3
>Reporter: jiulongzhu
>Priority: Critical
>  Labels: patch
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, 
> YARN-9838.0001.patch
>
>
>       In some clusters of ours, we are seeing "Used Resource","Used 
> Capacity","Absolute Used Capacity" and "Num Container" is positive or 
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In 
> extreme cases, apps couldn't be submitted to the queue that is actually idle 
> but the "Used Resource" is far more than zero, just like "Container Leak".
>       Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used 
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and 
> "Num Container" use the "numContainer" value kept by LeafQueue.And 
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will 
> change the state value of "numContainer" and "Used". Secondly, by comparing 
> the values numContainer and ResourceUsageByLabel and QueueMetrics 
> changed(#allocateContainer and #releaseContainer) logic of applications with 
> and without "movetoqueue",i found that moving the reservedContainers didn't 
> modify the "numContainer" value in AbstractCSQueue and "used" value in 
> ResourceUsage when the application was moved from a queue to another queue.
>         The metric values changed logic of reservedContainers are allocated, 
> and moved from $FROM queue to $TO queue, and released.The degree of increase 
> and decrease is not conservative, the Resource allocated from $FROM queue and 
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the 
> same,$TO queue stay the same{color}|decrease  in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM 
> queue stay the same,$TO queue stay the same{color}|decrease  in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in 
> $TO queue|decrease  in $TO queue|
>       The metric values changed logic of allocatedContainer(allocated, 
> acquired, running) are allocated, and movetoqueue, and released are 
> absolutely conservative.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metrics

2019-10-11 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9838:
---
Issue Type: Bug  (was: Improvement)

> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS 
> reserved containers for,will cause "Num Container" and "Used Resource" in 
> ResourceUsage metrics error 
> --
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.3
>Reporter: jiulongzhu
>Priority: Critical
>  Labels: patch
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, 
> YARN-9838.0001.patch
>
>
>       In some clusters of ours, we are seeing "Used Resource","Used 
> Capacity","Absolute Used Capacity" and "Num Container" is positive or 
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In 
> extreme cases, apps couldn't be submitted to the queue that is actually idle 
> but the "Used Resource" is far more than zero, just like "Container Leak".
>       Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used 
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and 
> "Num Container" use the "numContainer" value kept by LeafQueue.And 
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will 
> change the state value of "numContainer" and "Used". Secondly, by comparing 
> the values numContainer and ResourceUsageByLabel and QueueMetrics 
> changed(#allocateContainer and #releaseContainer) logic of applications with 
> and without "movetoqueue",i found that moving the reservedContainers didn't 
> modify the "numContainer" value in AbstractCSQueue and "used" value in 
> ResourceUsage when the application was moved from a queue to another queue.
>         The metric values changed logic of reservedContainers are allocated, 
> and moved from $FROM queue to $TO queue, and released.The degree of increase 
> and decrease is not conservative, the Resource allocated from $FROM queue and 
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the 
> same,$TO queue stay the same{color}|decrease  in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM 
> queue stay the same,$TO queue stay the same{color}|decrease  in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in 
> $TO queue|decrease  in $TO queue|
>       The metric values changed logic of allocatedContainer(allocated, 
> acquired, running) are allocated, and movetoqueue, and released are 
> absolutely conservative.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metrics

2019-10-11 Thread Tao Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9838:
---
Fix Version/s: (was: 2.7.3)

> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS 
> reserved containers for,will cause "Num Container" and "Used Resource" in 
> ResourceUsage metrics error 
> --
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.7.3
>Reporter: jiulongzhu
>Priority: Critical
>  Labels: patch
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, 
> YARN-9838.0001.patch
>
>
>       In some clusters of ours, we are seeing "Used Resource","Used 
> Capacity","Absolute Used Capacity" and "Num Container" is positive or 
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In 
> extreme cases, apps couldn't be submitted to the queue that is actually idle 
> but the "Used Resource" is far more than zero, just like "Container Leak".
>       Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used 
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and 
> "Num Container" use the "numContainer" value kept by LeafQueue.And 
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will 
> change the state value of "numContainer" and "Used". Secondly, by comparing 
> the values numContainer and ResourceUsageByLabel and QueueMetrics 
> changed(#allocateContainer and #releaseContainer) logic of applications with 
> and without "movetoqueue",i found that moving the reservedContainers didn't 
> modify the "numContainer" value in AbstractCSQueue and "used" value in 
> ResourceUsage when the application was moved from a queue to another queue.
>         The metric values changed logic of reservedContainers are allocated, 
> and moved from $FROM queue to $TO queue, and released.The degree of increase 
> and decrease is not conservative, the Resource allocated from $FROM queue and 
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the 
> same,$TO queue stay the same{color}|decrease  in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM 
> queue stay the same,$TO queue stay the same{color}|decrease  in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in 
> $TO queue|decrease  in $TO queue|
>       The metric values changed logic of allocatedContainer(allocated, 
> acquired, running) are allocated, and movetoqueue, and released are 
> absolutely conservative.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9838) Using the CapacityScheduler,Apply "movetoqueue" on the application which CS reserved containers for,will cause "Num Container" and "Used Resource" in ResourceUsage metri

2019-10-11 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949330#comment-16949330
 ] 

Tao Yang commented on YARN-9838:


Thanks [~jiulongZhu] for fixing this issue. 
The patch is LGTM in general,  some minor suggestions for the patch:
* check-style warnings need to be fixed, after that, you can run 
"dev-support/bin/test-patch /path/to/my.patch" to confirm.
* The indentation of updated log need to be adjusted and useless deletion of a 
blank line should be reverted in LeafQueue.
* The annotation "sync ResourceUsageByLabel ResourceUsageByUser and 
numContainer" can be removed since it seems unnecessary to add details here.
* As for UT, you can remove before-fixed block and just keep the correct 
verification.  Moreover, I think it's better to remove "//YARN-9838" since we 
can find the source easily by git, and the annotation style "/** */" often used 
for class or method, it's better to use "//" or "/* */" in the method.

> Using the CapacityScheduler,Apply "movetoqueue" on the application which CS 
> reserved containers for,will cause "Num Container" and "Used Resource" in 
> ResourceUsage metrics error 
> --
>
> Key: YARN-9838
> URL: https://issues.apache.org/jira/browse/YARN-9838
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 2.7.3
>Reporter: jiulongzhu
>Priority: Critical
>  Labels: patch
> Fix For: 2.7.3
>
> Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, 
> YARN-9838.0001.patch
>
>
>       In some clusters of ours, we are seeing "Used Resource","Used 
> Capacity","Absolute Used Capacity" and "Num Container" is positive or 
> negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In 
> extreme cases, apps couldn't be submitted to the queue that is actually idle 
> but the "Used Resource" is far more than zero, just like "Container Leak".
>       Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used 
> Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and 
> "Num Container" use the "numContainer" value kept by LeafQueue.And 
> AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will 
> change the state value of "numContainer" and "Used". Secondly, by comparing 
> the values numContainer and ResourceUsageByLabel and QueueMetrics 
> changed(#allocateContainer and #releaseContainer) logic of applications with 
> and without "movetoqueue",i found that moving the reservedContainers didn't 
> modify the "numContainer" value in AbstractCSQueue and "used" value in 
> ResourceUsage when the application was moved from a queue to another queue.
>         The metric values changed logic of reservedContainers are allocated, 
> and moved from $FROM queue to $TO queue, and released.The degree of increase 
> and decrease is not conservative, the Resource allocated from $FROM queue and 
> release to $TO queue.
> ||move reversedContainer||allocate||movetoqueue||release||
> |numContainer|increase in $FROM queue|{color:#FF}$FROM queue stay the 
> same,$TO queue stay the same{color}|decrease  in $TO queue|
> |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF}$FROM 
> queue stay the same,$TO queue stay the same{color}|decrease  in $TO queue |
> |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in 
> $TO queue|decrease  in $TO queue|
>       The metric values changed logic of allocatedContainer(allocated, 
> acquired, running) are allocated, and movetoqueue, and released are 
> absolutely conservative.
>    



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949311#comment-16949311
 ] 

Szilard Nemeth edited comment on YARN-9699 at 10/11/19 9:28 AM:


Added a final patch (017) that fixes checkstyle/whitespace issues and adds a 
warning text, discussed above.

+1 for the latest patch.

[~sunilg]: Patch is ready for review as it has 1 non-binding and 1 binding +1s


was (Author: snemeth):
Added a final patch (017) that fixes checkstyle/whitespace issues and adds a 
warning text, discussed above.

> [Phase 1] Migration tool that help to generate CS config based on FS config
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, 
> YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, 
> YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, 
> YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, 
> YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, 
> YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, 
> YARN-9699.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config

2019-10-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9699:
-
Attachment: YARN-9699-017.patch

> [Phase 1] Migration tool that help to generate CS config based on FS config
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, 
> YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, 
> YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, 
> YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, 
> YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, 
> YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, 
> YARN-9699.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949311#comment-16949311
 ] 

Szilard Nemeth commented on YARN-9699:
--

Added a final patch (017) that fixes checkstyle/whitespace issues and adds a 
warning text, discussed above.

> [Phase 1] Migration tool that help to generate CS config based on FS config
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, 
> YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, 
> YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, 
> YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, 
> YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, 
> YARN-9699-016.patch, YARN-9699-017.patch, YARN-9699.001.patch, 
> YARN-9699.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9881) In YARN ui2 attempts tab, The running Application Attempt's Container's ElapsedTime is incorrect.

2019-10-11 Thread jenny (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949305#comment-16949305
 ] 

jenny commented on YARN-9881:
-

Hi [~Prabhu Joseph] , I will attach details, please don't close, thank you

> In YARN ui2 attempts tab, The running Application Attempt's Container's 
> ElapsedTime is incorrect.
> -
>
> Key: YARN-9881
> URL: https://issues.apache.org/jira/browse/YARN-9881
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1, 3.2.1
>Reporter: jenny
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9836) General usability improvements in showSimulationTrace.html

2019-10-11 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949285#comment-16949285
 ] 

Gergely Pollak commented on YARN-9836:
--

Thank you [~adam.antal]! LGTM + 1 (non-binding)

> General usability improvements in showSimulationTrace.html
> --
>
> Key: YARN-9836
> URL: https://issues.apache.org/jira/browse/YARN-9836
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9836.001.patch, YARN-9836.002.patch, 
> YARN-9836.003.patch
>
>
> There are some small usability improvements that can be made for the offline 
> analysis page (showSimulationTrace.html):
> - empty divs can be hidden until no data is displayed
> - the site can be refactored to be responsive given that bootstrap is already 
> available as third party library
> - there's no proper error handling in the site (e.g. a JSON is malformed and 
> similar cases) which is really a big problem
> - there's no indentation in the raw html file which makes supportability even 
> worse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9699) [Phase 1] Migration tool that help to generate CS config based on FS config

2019-10-11 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949277#comment-16949277
 ] 

Gergely Pollak commented on YARN-9699:
--

Hi [~snemeth] and [~pbacsko], thank you for the patch, LGTM+1 (non-binding)

> [Phase 1] Migration tool that help to generate CS config based on FS config
> ---
>
> Key: YARN-9699
> URL: https://issues.apache.org/jira/browse/YARN-9699
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wanqiang Ji
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: FS_to_CS_migration_POC.patch, YARN-9699-003.patch, 
> YARN-9699-004.patch, YARN-9699-005.patch, YARN-9699-006.patch, 
> YARN-9699-007.patch, YARN-9699-008.patch, YARN-9699-009.patch, 
> YARN-9699-010.patch, YARN-9699-011.patch, YARN-9699-012.patch, 
> YARN-9699-013.patch, YARN-9699-014.patch, YARN-9699-015.patch, 
> YARN-9699-016.patch, YARN-9699.001.patch, YARN-9699.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949261#comment-16949261
 ] 

Szilard Nemeth commented on YARN-8453:
--

Hi [~adam.antal]!

Thanks for uploading a new patch, looking at it soon.

In the meantime, could you please upload branch-3.2 / branch-3.1 patches?

 

Thanks!

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil G
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-8453.001.patch, YARN-8453.002.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9860) Enable service mode for Docker containers on YARN

2019-10-11 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949191#comment-16949191
 ] 

Prabhu Joseph commented on YARN-9860:
-

Thanks [~eyang] and [~shaneku...@gmail.com].

> Enable service mode for Docker containers on YARN
> -
>
> Key: YARN-9860
> URL: https://issues.apache.org/jira/browse/YARN-9860
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: Screen Shot 2019-10-09 at 11.27.19 AM.png, 
> YARN-9860-001.patch, YARN-9860-002.patch, YARN-9860-003.patch, 
> YARN-9860-004.patch, YARN-9860-005.patch, YARN-9860-006.patch, 
> YARN-9860-007.patch, YARN-9860-008.patch, YARN-9860-009.patch
>
>
> This task is to add support to YARN for running Docker containers in "Service 
> Mode". 
> Service Mode - Run the container as defined by the image, but still allow for 
> injecting configuration. 
> Background:
>   Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as 
> defined in the image. However, still requires modification to official images 
> due to user propagation
> User propagation is problematic for running a secure cluster with sssd
>   
> Implementation:
>   Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true)
>   Must be requested at runtime - (example: 
> YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true)
>   Entrypoint mode is default enabled for this mode (If Service Mode is 
> requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set 
> to true)
>   Writable log mount will not be added - stdout logging may still work 
> with entrypoint mode - remove the writable bind mounts
>   User and groups will not be propagated (now: docker run --user nobody 
> --group-add=nobody  , after: docker run  )
>   Read-only resources mounted at the file level, files get chmod 777, 
> parent directory only accessible by the run as user.
> cc [~shaneku...@gmail.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9885) Container allocation when queue usage is below MIN guarantees

2019-10-11 Thread Prashant Golash (Jira)
Prashant Golash created YARN-9885:
-

 Summary: Container allocation when queue usage is below MIN 
guarantees
 Key: YARN-9885
 URL: https://issues.apache.org/jira/browse/YARN-9885
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 3.2.1
Reporter: Prashant Golash


Filing this JIRA to calculate the time spend in container allocation when queue 
usage is below min (during the whole time for the container).

Customers generally ask YARN SLA for container allocation when their queue 
usage is below min. I have some implementation in my mind but I want to confirm 
if from the community if this would be a helpful feature or if this is already 
implemented?

 

cc [~leftnoteasy]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-10-11 Thread Prashant Golash (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949183#comment-16949183
 ] 

Prashant Golash commented on YARN-9656:
---

Thanks, [~wangda] for taking a look. Initial we thought of just keeping node 
"unhealthy" and extended our NMs to include these checks in NM health check 
scripts, but realized that this could result in a lot of unhealthy nodes (For 
e.g in our cluster), so we thought of adding intermediate stage "stressed" and 
control by the threshold at RM layer as well.

I guess this may be specific to the environment and for upstream just 
configuring NM scripts should be enough.

> Plugin to avoid scheduling jobs on node which are not in "schedulable" state, 
> but are healthy otherwise.
> 
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.1, 3.1.2
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
> Attachments: 2.patch
>
>
> Creating this Jira to get idea from the community if this is something 
> helpful which can be done in YARN. Some times the nodes go in a bad state for 
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if 
> CGroup is not enabled, nodes may be running very high on CPU and the jobs 
> scheduled on them will suffer.
>  
> The idea is three-fold:
>  # Gather relevant metrics from node-managers and put in some form (for e.g. 
> exclude file).
>  # RM loads the files and put the nodes as part of the blacklist.
>  # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if 
> this is something which could be helpful as an upstream feature in YARN.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org