[jira] [Created] (YARN-10282) CLONE - hadoop-yarn-server-nodemanager build failed: make failed with error code 2

2020-05-20 Thread lynsey (Jira)
lynsey created YARN-10282:
-

 Summary: CLONE - hadoop-yarn-server-nodemanager build failed: make 
failed with error code 2
 Key: YARN-10282
 URL: https://issues.apache.org/jira/browse/YARN-10282
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: lynsey


when i compile hadoop-3.2.0 release,i encountered the following errors:

[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) on 
project hadoop-yarn-server-nodemanager: make failed with error code 2 -> [Help 
1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) on 
project hadoop-yarn-server-nodemanager: make failed with error code 2
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
 at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
 at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: make failed with 
error code 2
 at 
org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.runMake(CompileMojo.java:231)
 at 
org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.execute(CompileMojo.java:98)
 at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
 ... 20 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn  -rf :hadoop-yarn-server-nodemanager

 

my compiling environment:

jdk 1.8.0_181

maven:3.3.9(/3.6.0)

cmake version 3.12.0





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-10283:
---

 Summary: Capacity Scheduler: starvation occurs if a higher 
priority queue is full a and node labels are used
 Key: YARN-10283
 URL: https://issues.apache.org/jira/browse/YARN-10283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we succeed with the allocation if there's room for 
a container.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112051#comment-17112051
 ] 

Peter Bacsko commented on YARN-10283:
-

Quick workaround:

{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // defends against container allocation
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we succeed with the allocation if there's room 
> for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112051#comment-17112051
 ] 

Peter Bacsko edited comment on YARN-10283 at 5/20/20, 10:59 AM:


Quick workaround:
{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}
A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.


was (Author: pbacsko):
Quick workaround:

{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // defends against container allocation
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *P

[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112051#comment-17112051
 ] 

Peter Bacsko edited comment on YARN-10283 at 5/20/20, 11:00 AM:


Quick workaround:
{noformat}
  [...]
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
  [...]
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.


was (Author: pbacsko):
Quick workaround:
{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}
A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Summary: Capacity Scheduler: starvation occurs if a higher priority queue 
is full and node labels are used  (was: Capacity Scheduler: starvation occurs 
if a higher priority queue is full a and node labels are used)

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we succeed with the allocation if there's room 
> for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Description: 
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we enter a different code path and succeed with the 
allocation if there's room for a container.



  was:
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we succeed with the allocation if there's room for 
a container.




> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not ha

[jira] [Created] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-20 Thread Adam Antal (Jira)
Adam Antal created YARN-10284:
-

 Summary: Add lazy initialization of 
LogAggregationFileControllerFactory in LogServlet
 Key: YARN-10284
 URL: https://issues.apache.org/jira/browse/YARN-10284
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, yarn
Affects Versions: 3.3.0
Reporter: Adam Antal
Assignee: Adam Antal


Suppose the {{mapred}} user has no access to the remote folder. Pinging the JHS 
if it's online in every few seconds will produce the following entry in the log:
{noformat}
2020-05-19 00:17:20,331 WARN 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
 Unable to determine if the filesystem supports append operation
java.nio.file.AccessDeniedException: test-bucket: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
for the group(s) associated with the authenticated user. (user: mapred)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
[...]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
at 
org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
at 
org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
at 
org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
[...]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
at java.lang.Thread.run(Thread.java:748)
{noformat}

We should only create the {{LogAggregationFactory}} instance when we actually 
need it, not every time the {{LogServlet}} object is instantiated (so 
definitely not in the constructor). In this way we prevent pressure on the S3A 
auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-20 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10284:
--
Parent: YARN-10025
Issue Type: Sub-task  (was: Bug)

> Add lazy initialization of LogAggregationFileControllerFactory in LogServlet
> 
>
> Key: YARN-10284
> URL: https://issues.apache.org/jira/browse/YARN-10284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Suppose the {{mapred}} user has no access to the remote folder. Pinging the 
> JHS if it's online in every few seconds will produce the following entry in 
> the log:
> {noformat}
> 2020-05-19 00:17:20,331 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
>  Unable to determine if the filesystem supports append operation
> java.nio.file.AccessDeniedException: test-bucket: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
> for the group(s) associated with the authenticated user. (user: mapred)
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
> [...]
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
>   at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> [...]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> We should only create the {{LogAggregationFactory}} instance when we actually 
> need it, not every time the {{LogServlet}} object is instantiated (so 
> definitely not in the constructor). In this way we prevent pressure on the 
> S3A auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-20 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10284:
--
Attachment: YARN-10284.001.patch

> Add lazy initialization of LogAggregationFileControllerFactory in LogServlet
> 
>
> Key: YARN-10284
> URL: https://issues.apache.org/jira/browse/YARN-10284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10284.001.patch
>
>
> Suppose the {{mapred}} user has no access to the remote folder. Pinging the 
> JHS if it's online in every few seconds will produce the following entry in 
> the log:
> {noformat}
> 2020-05-19 00:17:20,331 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
>  Unable to determine if the filesystem supports append operation
> java.nio.file.AccessDeniedException: test-bucket: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
> for the group(s) associated with the authenticated user. (user: mapred)
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
> [...]
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
>   at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> [...]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> We should only create the {{LogAggregationFactory}} instance when we actually 
> need it, not every time the {{LogServlet}} object is instantiated (so 
> definitely not in the constructor). In this way we prevent pressure on the 
> S3A auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-20 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112166#comment-17112166
 ] 

Adam Antal commented on YARN-10284:
---

I'll test the patch on a deployed cluster, and will probably need a UT to cover 
this from the JHS's direction.

> Add lazy initialization of LogAggregationFileControllerFactory in LogServlet
> 
>
> Key: YARN-10284
> URL: https://issues.apache.org/jira/browse/YARN-10284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10284.001.patch
>
>
> Suppose the {{mapred}} user has no access to the remote folder. Pinging the 
> JHS if it's online in every few seconds will produce the following entry in 
> the log:
> {noformat}
> 2020-05-19 00:17:20,331 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
>  Unable to determine if the filesystem supports append operation
> java.nio.file.AccessDeniedException: test-bucket: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
> for the group(s) associated with the authenticated user. (user: mapred)
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
> [...]
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
>   at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> [...]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> We should only create the {{LogAggregationFactory}} instance when we actually 
> need it, not every time the {{LogServlet}} object is instantiated (so 
> definitely not in the constructor). In this way we prevent pressure on the 
> S3A auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10276) Check and improve memory footprint of CapacityScheduler CSQueueStore

2020-05-20 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak reassigned YARN-10276:
-

Assignee: Gergely Pollak

> Check and improve memory footprint of CapacityScheduler CSQueueStore
> 
>
> Key: YARN-10276
> URL: https://issues.apache.org/jira/browse/YARN-10276
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>
> The class creates a lot of Set instances, which might have a bit bigger 
> memory overhead than necessary, this might be not a critical issue, but let's 
> examine if we can or should create a more memory efficient solution while 
> keeping the performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-05-20 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10108:
--
Attachment: YARN-10108.004.patch

> FS-CS converter: nestedUserQueue with default rule results in invalid queue 
> mapping
> ---
>
> Key: YARN-10108
> URL: https://issues.apache.org/jira/browse/YARN-10108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10108.001.patch, YARN-10108.002.patch, 
> YARN-10108.003.patch, YARN-10108.004.patch
>
>
> FS Queue Placement Policy
> {code:java}
> 
> 
> 
> 
> 
>  {code}
> gets mapped to an invalid CS queue mapping "u:%user:root.users.%user"
> RM fails to start with above queue mapping in CS
> {code:java}
> 2020-01-28 00:19:12,889 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue [%user] and invalid parent queue 
> [root.users]
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:829)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1247)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
> [%user] and invalid parent queue [root.users]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:48)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:712)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:753)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:361)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:426)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 7 more
> {code}
> QueuePlacementConverter#handleNestedRule has to be fixed.
> {code:java}
> else if (pr instanceof DefaultPlacementRule) {
>   DefaultPlacementRule defaultRule = (DefaultPlacementRule) pr;
>   mapping.append("u:" + USER + ":")
> .append(defaultRule.defaultQueueName)
> .append("." + USER);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-05-20 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112183#comment-17112183
 ] 

Gergely Pollak commented on YARN-10108:
---

After rebase a clean build compiled without issues, reuploading patch to 
retrigger jenkins.

> FS-CS converter: nestedUserQueue with default rule results in invalid queue 
> mapping
> ---
>
> Key: YARN-10108
> URL: https://issues.apache.org/jira/browse/YARN-10108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10108.001.patch, YARN-10108.002.patch, 
> YARN-10108.003.patch, YARN-10108.004.patch
>
>
> FS Queue Placement Policy
> {code:java}
> 
> 
> 
> 
> 
>  {code}
> gets mapped to an invalid CS queue mapping "u:%user:root.users.%user"
> RM fails to start with above queue mapping in CS
> {code:java}
> 2020-01-28 00:19:12,889 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue [%user] and invalid parent queue 
> [root.users]
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:829)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1247)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
> [%user] and invalid parent queue [root.users]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:48)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:712)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:753)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:361)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:426)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 7 more
> {code}
> QueuePlacementConverter#handleNestedRule has to be fixed.
> {code:java}
> else if (pr instanceof DefaultPlacementRule) {
>   DefaultPlacementRule defaultRule = (DefaultPlacementRule) pr;
>   mapping.append("u:" + USER + ":")
> .append(defaultRule.defaultQueueName)
> .append("." + USER);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10269) SchedConfCLI and LogWebService should reuse util class WebServiceClient

2020-05-20 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10269:
-
Attachment: YARN-10269.003.patch

> SchedConfCLI and LogWebService should reuse util class WebServiceClient
> ---
>
> Key: YARN-10269
> URL: https://issues.apache.org/jira/browse/YARN-10269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-10269.001.patch, YARN-10269.002.patch, 
> YARN-10269.003.patch
>
>
> WebServiceClient is for creating client object based on conf ie http/https . 
> so
> SchedConfCLI#createWebServiceClient and LogWebService#createTimelineWebClient 
> can use it for creating client object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112243#comment-17112243
 ] 

Hadoop QA commented on YARN-10284:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
15s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26045/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10284 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13003521/YARN-10284.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux cb55e3e98f5d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 29b19cd5924 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26045/testReport/ |
| Max. process+thread count | 445 (vs. ulimit of 5500)

[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-20 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112317#comment-17112317
 ] 

Bilwa S T commented on YARN-8047:
-

cc [~inigoiri] [~aajisaka]

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112329#comment-17112329
 ] 

Sunil G commented on YARN-8047:
---

[~BilwaST] cud u pls help to explain how to use an external class ? 
and we can doc also the same.

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-20 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10254:
--
Attachment: YARN-10254.004.patch

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch, 
> YARN-10254.003.patch, YARN-10254.004.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-20 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112345#comment-17112345
 ] 

Gergely Pollak commented on YARN-10254:
---

[~pbacsko] thank you for the feedbacks, the additional logging is indeed a good 
idea, uploaded the next patch, I still expect some tests to fail, since 
YARN-10108 isn't merged yet on which we depend on it but want to run a a new 
round because on my machine I saw a few flaky tests and I wonder if it is a 
local issue, or I've introduced new issues with the logging (I doubt it, but 
it's better to be on the safe side)

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch, 
> YARN-10254.003.patch, YARN-10254.004.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10269) SchedConfCLI and LogWebService should reuse util class WebServiceClient

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112367#comment-17112367
 ] 

Hadoop QA commented on YARN-10269:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
55s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
17s{color} | {color:red} hadoop-yarn-server-timelineservice in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  2m 
13s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  2m 13s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn-server-timelineservice in the patch 
failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
36s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-timelineservice in the patch 
failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
5s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 19s{color} 
| {color:red} hadoop-yarn-server-timelineservice in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m 49s{color} 
| {color:red} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 15s{color} | 
{color:black} {color} |
\\
\\

[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112371#comment-17112371
 ] 

Manikandan R commented on YARN-6492:


[~jhung]  [~epayne] 

Attached .009 patch based on our discussions:
 # Retain existing default Queue Metrics behaviour (after YARN-6467).
 # Partition Metrics
 # Partition * Queue Metrics
 # Partition * Queue * User Metrics (Only If USER METRICS has been enabled).

Please review and share your feedback.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-20 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112370#comment-17112370
 ] 

Bilwa S T commented on YARN-8047:
-

Hi [~sunilg]

We can configure external webapp classes like below
{code:java}

yarn.http.rmwebapp.external.classes
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices, 
org.apache.hadoop.yarn.server.resourcemanager.webapp.DummyClass

{code}

I will write description in detail and upload patch. Thanks


> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.009.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112375#comment-17112375
 ] 

Hadoop QA commented on YARN-10108:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
43s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 
56s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26046/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10108 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13003522/YARN-10108.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 45a398f08cca 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 29b19cd5924 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26046/testReport/ |
| Max. process+thread count | 871 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop

[jira] [Updated] (YARN-10269) SchedConfCLI and LogWebService should reuse util class WebServiceClient

2020-05-20 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10269:
-
Attachment: (was: YARN-10269.003.patch)

> SchedConfCLI and LogWebService should reuse util class WebServiceClient
> ---
>
> Key: YARN-10269
> URL: https://issues.apache.org/jira/browse/YARN-10269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-10269.001.patch, YARN-10269.002.patch
>
>
> WebServiceClient is for creating client object based on conf ie http/https . 
> so
> SchedConfCLI#createWebServiceClient and LogWebService#createTimelineWebClient 
> can use it for creating client object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10269) SchedConfCLI and LogWebService should reuse util class WebServiceClient

2020-05-20 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10269:
-
Attachment: YARN-10269.003.patch

> SchedConfCLI and LogWebService should reuse util class WebServiceClient
> ---
>
> Key: YARN-10269
> URL: https://issues.apache.org/jira/browse/YARN-10269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-10269.001.patch, YARN-10269.002.patch, 
> YARN-10269.003.patch
>
>
> WebServiceClient is for creating client object based on conf ie http/https . 
> so
> SchedConfCLI#createWebServiceClient and LogWebService#createTimelineWebClient 
> can use it for creating client object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10228) Yarn Service fails if am java opts contains ZK authentication file path

2020-05-20 Thread Eric Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-10228:
-
Fix Version/s: 3.4.0
 Target Version/s: 3.4.0
Affects Version/s: 3.3.0

> Yarn Service fails if am java opts contains ZK authentication file path
> ---
>
> Key: YARN-10228
> URL: https://issues.apache.org/jira/browse/YARN-10228
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10228.001.patch
>
>
> If i configure 
> {code:java}
> yarn.service.am.java.opts=-Xmx768m 
> -Djava.security.auth.login.config=/opt/hadoop/etc/jaas-zk.conf
> {code}
> Invalid character error is getting printed .
> This is due to jvm opts validation added in YARN-9718



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10228) Yarn Service fails if am java opts contains ZK authentication file path

2020-05-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112429#comment-17112429
 ] 

Hudson commented on YARN-10228:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18281 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18281/])
YARN-10228. Relax restriction of file path character in (eyang: rev 
726b8e324b6fc99aac5a26fbbc7edd26a3a25479)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/utils/ServiceApiUtil.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/utils/TestServiceApiUtil.java


> Yarn Service fails if am java opts contains ZK authentication file path
> ---
>
> Key: YARN-10228
> URL: https://issues.apache.org/jira/browse/YARN-10228
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.3.0, 3.4.0, 3.3.1
>
> Attachments: YARN-10228.001.patch
>
>
> If i configure 
> {code:java}
> yarn.service.am.java.opts=-Xmx768m 
> -Djava.security.auth.login.config=/opt/hadoop/etc/jaas-zk.conf
> {code}
> Invalid character error is getting printed .
> This is due to jvm opts validation added in YARN-9718



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112496#comment-17112496
 ] 

Hadoop QA commented on YARN-6492:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
45s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 49 new + 635 unchanged - 6 fixed = 684 total (was 641) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 75 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
31s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 69 unchanged - 0 fixed = 70 total (was 69) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
48s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
33s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Dead store to metrics in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionQueueMetrics(String)
  At 
QueueMetrics.java:org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionQueueMetrics(String)
  At QueueMetrics.java:[line 317] |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.QUEUE_METR

[jira] [Commented] (YARN-10269) SchedConfCLI and LogWebService should reuse util class WebServiceClient

2020-05-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112517#comment-17112517
 ] 

Hadoop QA commented on YARN-10269:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
54s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
5s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
10s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
50s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
43s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 
11s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
58s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
http

[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
 ] 

Jonathan Hung commented on YARN-6492:
-

Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* I don't think setAvailableResourcesToQueue is handled correctly. It appears 
to update partition metrics no matter which queue this method is invoked for. 
Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2, 
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2, 
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue 
if the queue is root?
* Delete {noformat}println System.out.println(" final is " + 
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific 
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
be 
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The 
stuff removed seems to be non-exclusive node label functionality (default 
partition node heartbeating, and checking queue metrics are correct), so we 
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is 
only 50% of 10GB = 5GB?

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
 ] 

Jonathan Hung edited comment on YARN-6492 at 5/20/20, 10:57 PM:


Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matter which queue this 
method is invoked for. Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2, 
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2, 
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue 
if the queue is root?
* Delete {noformat}println System.out.println(" final is " + 
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific 
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
be 
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The 
stuff removed seems to be non-exclusive node label functionality (default 
partition node heartbeating, and checking queue metrics are correct), so we 
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is 
only 50% of 10GB = 5GB?


was (Author: jhung):
Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* I don't think setAvailableResourcesToQueue is handled correctly. It appears 
to update partition metrics no matter which queue this method is invoked for. 
Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
 ] 

Jonathan Hung edited comment on YARN-6492 at 5/20/20, 10:57 PM:


Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matter which queue this 
method is invoked for. Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2, 
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2, 
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue 
if the queue is root?
* Delete {noformat}System.out.println(" final is " + 
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific 
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
be 
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The 
stuff removed seems to be non-exclusive node label functionality (default 
partition node heartbeating, and checking queue metrics are correct), so we 
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is 
only 50% of 10GB = 5GB?


was (Author: jhung):
Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matter which queue this 
method is invoked for. Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partition

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
 ] 

Jonathan Hung edited comment on YARN-6492 at 5/20/20, 10:58 PM:


Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). Putting the user before the queue and adding the delimiter 
should prevent the user from being interpreted as part of the queue path. I see 
a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matter which queue this 
method is invoked for. Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2, 
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2, 
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue 
if the queue is root?
* Delete {noformat}System.out.println(" final is " + 
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific 
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
be 
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The 
stuff removed seems to be non-exclusive node label functionality (default 
partition node heartbeating, and checking queue metrics are correct), so we 
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is 
only 50% of 10GB = 5GB?


was (Author: jhung):
Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). I see a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matte

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112648#comment-17112648
 ] 

Jonathan Hung edited comment on YARN-6492 at 5/20/20, 10:59 PM:


Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). Putting the user before the queue and adding the delimiter 
should prevent the user from being interpreted as part of the queue path. I see 
a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the {{users}} map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the {{users}} map is 
not static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes needed?
* For partition metrics, I don't think setAvailableResourcesToQueue is handled 
correctly. It appears to update partition metrics no matter which queue this 
method is invoked for. Thus for example on line 87 of TestPartitionQueueMetrics:
{noformat}checkResources(partitionSource, 0, 0, 0, 100 * GB, 100, 2 * GB, 2, 
2);{noformat}
should be
{noformat}checkResources(partitionSource, 0, 0, 0, 200 * GB, 200, 2 * GB, 2, 
2);{noformat}
Perhaps we should only update partition metrics in setAvailableResourcesToQueue 
if the queue is root?
* Delete {noformat}System.out.println(" final is " + 
parentQueueSource_X.toString());{noformat}
* Same in TestQueueMetrics, there should not be capacity scheduler specific 
logic here, can we remove these changes?
* On line 2539 of TestNodeLabelContainerAllocation, should
{noformat}assertEquals(2 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
be 
{noformat}assertEquals(1.5 * GB, queueAUserMetrics.getAvailableMB(), 
delta);{noformat}
?
* Do we need the tests after line 2551 on TestNodeLabelContainerAllocation? The 
stuff removed seems to be non-exclusive node label functionality (default 
partition node heartbeating, and checking queue metrics are correct), so we 
probably want to keep these tests.
* On line 2566, how is node1 getting 8 containers if queue A's max capacity is 
only 50% of 10GB = 5GB?


was (Author: jhung):
Thank you [~maniraj...@gmail.com]. Looks fine at a high level. A few comments:
* We can change parentQueue in QueueMetrics.java to be Queue instead of 
AbstractCSQueue (to fix test cases)
* Right now we're concatenating QUEUE_METRICS keys as "partition + queuePath + 
userName", can we change this to "partition + '.' + userName + '.' + queuePath" 
? In particular the queuePath + userName part could cause conflicts (e.g. queue 
named "root.auser" could conflict with user metrics under queue "root.a" and 
username "user"). Putting the user before the queue and adding the delimiter 
should prevent the user from being interpreted as part of the queue path. I see 
a few places for this:
# PartitionQueueMetrics#constructor#parentMetricName
# PartitionQueueMetrics#getUserMetrics#metricName
# QueueMetrics#getUserMetrics#metricName
# QueueMetrics#getPartitionQueueMetrics#metricName
# Key for QueueMetrics#getPartitionMetrics could collide if the partition name 
is "root"
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add the metrics object to QUEUE_METRICS, since we're 
accessing user metrics via the user map (and not the QUEUE_METRICS map)
* In QueueMetrics#getUserMetrics and PartitionQueueMetrics#getUserMetrics, I 
don't think we need to add queue path to the key, since the users map is not 
static
* QueueMetrics#queueSource method does not seem to be used anywhere, can we 
delete it?
* How come we need a CSQueueMetrics#forQueue implementation? It looks the same 
as QueueMetrics#forQueue
* We shouldn't add capacity scheduler specific things in QueueInfo, are these 
changes ne