from:"Peter Bacsko \(JIRA\)"

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-22 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114298#comment-17114298
 ] 

Peter Bacsko commented on YARN-10283:
-

I uploaded a new repro test patch. I was able to prove my theory.

If there are no node labels,  the same occurs if reservationsContinueLooking == 
false AND minimum-allocation-mb == 512.
However, if you increase minimum-allocation-mb to 1024 then we are saved becase 
{{checkHeadroom()}} returns with false and no allocation occurs.

This can be reproduced with the new test case.

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, 
> YARN-10283-ReproTest2.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-22 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Attachment: YARN-10283-ReproTest2.patch

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, 
> YARN-10283-ReproTest2.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113536#comment-17113536
 ] 

Peter Bacsko commented on YARN-10283:
-

[~tarunparimi] It's very interesting what you're saying. In fact, I wanted to 
elaborate on this earlier just forgot about it.

To me, it seems as though if "reservationsContinueLooking" disabled, we can 
have the same starvation bevahiour even without node labels. Because this will 
always be false:

{noformat}
if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

Not sure if I'm missing something, but if we pick the list of schedulable 
application attempts in the same order, then again, lower priority queues will 
be ignored completely. But this would be strange - this error probably should 
have been spotted much earlier. Anyway would be good to know if it's really an 
issue or not.


> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113294#comment-17113294
 ] 

Peter Bacsko commented on YARN-10283:
-

Added repro test patch. Took me a solid 5-6 hours to figure out how to manage 
{{MockNM}} heartbeats, allocations and scheduling, there were missing methods, 
etc.

Anyway, the test fails by default. Mock AM2 cannot start, it stays in 
{{SCHEDULED}} state. However, after my proposed change is applied, it can start.

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Attachment: YARN-10283-ReproTest.patch

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Description: 
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we enter a different code path and succeed with the 
allocation if there's room for a container.



  was:
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we enter a different code path and succeed with the 
allocation if there's room for a container.




> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113157#comment-17113157
 ] 

Peter Bacsko commented on YARN-10283:
-

[~prabhujoseph] yes, I'm working on it.

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113054#comment-17113054
 ] 

Peter Bacsko commented on YARN-10283:
-

Created a POC which _might_ work. I have no idea.

[~prabhujoseph] could you take a look at this?

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-21 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Attachment: YARN-10283-POC01.patch

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Description: 
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we enter a different code path and succeed with the 
allocation if there's room for a container.



  was:
Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we succeed with the allocation if there's room for 
a container.




> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not

[jira] [Updated] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10283:

Summary: Capacity Scheduler: starvation occurs if a higher priority queue 
is full and node labels are used  (was: Capacity Scheduler: starvation occurs 
if a higher priority queue is full a and node labels are used)

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we succeed with the allocation if there's room 
> for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112051#comment-17112051
 ] 

Peter Bacsko edited comment on YARN-10283 at 5/20/20, 11:00 AM:


Quick workaround:
{noformat}
  [...]
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
  [...]
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.


was (Author: pbacsko):
Quick workaround:
{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}
A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
>

[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112051#comment-17112051
 ] 

Peter Bacsko edited comment on YARN-10283 at 5/20/20, 10:59 AM:


Quick workaround:
{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // 
  // Defends against container allocation
  // 
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}
A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.


was (Author: pbacsko):
Quick workaround:

{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // defends against container allocation
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
>

[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112051#comment-17112051
 ] 

Peter Bacsko commented on YARN-10283:
-

Quick workaround:

{noformat}
  if (null == unreservedContainer) {
// Skip the locality request
ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
activitiesManager, node, application, schedulerKey,
ActivityDiagnosticConstant.
NODE_CAN_NOT_FIND_CONTAINER_TO_BE_UNRESERVED_WHEN_NEEDED,
ActivityLevel.NODE);
return ContainerAllocation.LOCALITY_SKIPPED;
  }
}
  }

  // defends against container allocation
  if (!node.getLabels().isEmpty() && needToUnreserve) {
LOG.debug("Using label: {} - needed to unreserve container", 
node.getPartition());
return ContainerAllocation.LOCALITY_SKIPPED;
  }

  ContainerAllocation result = new ContainerAllocation(unreservedContainer,
  pendingAsk.getPerAllocationResource(), AllocationState.ALLOCATED);
  result.containerNodeType = type;
  result.setToKillContainers(toKillContainers);
  return result;
{noformat}

A better solution is probably to extend 
{{FiCaSchedulerApp.findNodeToUnreserve(FiCaSchedulerNode, SchedulerRequestKey, 
Resource)}} with the partition or create an entirely new method.

> Capacity Scheduler: starvation occurs if a higher priority queue is full a 
> and node labels are used
> ---
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we succeed with the allocation if there's room 
> for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used

2020-05-20 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10283:
---

 Summary: Capacity Scheduler: starvation occurs if a higher 
priority queue is full a and node labels are used
 Key: YARN-10283
 URL: https://issues.apache.org/jira/browse/YARN-10283
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Recently we've been investigating a scenario where applications submitted to a 
lower priority queue could not get scheduled because a higher priority queue in 
the same hierarchy could now satisfy the allocation request. Both queue 
belonged to the same partition.

If we disabled node labels, the problem disappeared.

The problem is that {{RegularContainerAllocator}} always allocated a container 
for the request, even if it should not have.

*Example:*
* Cluster total resources: 3 nodes, 15GB, 24 vcores
* Partition "shared" was created with 2 nodes
* "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added 
to the partition
* Both queues have a limit of 
* Using DominantResourceCalculator

Setup:
Submit distributed shell application to highprio with switches "-num_containers 
3 -container_vcores 4". The memory allocation is 512MB per container.

Chain of events:

1. Queue is filled with contaners until it reaches usage 
2. A node update event is pushed to CS from a node which is part of the 
partition
2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
than the current limit resource 
3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
allocated container for 
4. But we can't commit the resource request because we would have 9 vcores in 
total, violating the limit.

The problem is that we always try to assign container for the same application 
in each heartbeat from "highprio". Applications in "lowprio" cannot make 
progress.

*Problem:*
{{RegularContainerAllocator.assignContainer()}} does not handle this case well. 
We only reject allocation if this condition is satisfied:

{noformat}
 if (rmContainer == null && reservationsContinueLooking
  && node.getLabels().isEmpty()) {
{noformat}

But if we have node labels, we succeed with the allocation if there's room for 
a container.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9863) Randomize List of Resources to Localize

2020-05-14 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107386#comment-17107386
 ] 

Peter Bacsko commented on YARN-9863:


I would involve active people in the YARN community like [~snemeth], 
[~wilfreds]. [~sunilg] or [~prabhujoseph] could also share their thoughts about 
this improvement.

> Randomize List of Resources to Localize
> ---
>
> Key: YARN-9863
> URL: https://issues.apache.org/jira/browse/YARN-9863
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: YARN-9863.1.patch, YARN-9863.2.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/LocalResourceBuilder.java
> Add a new parameter to {{LocalResourceBuilder}} that allows the list of 
> resources to be shuffled randomly.  This will allow the Localizer to spread 
> the load of requests so that not all of the NodeManagers are requesting to 
> localize the same files, in the same order, from the same DataNodes,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106252#comment-17106252
 ] 

Peter Bacsko edited comment on YARN-10254 at 5/13/20, 12:16 PM:


Thanks for the latest patch [~shuzirra] I don't really have complaints, except 
for one thing.

I do believe that we need to extend the current code and this patch with more 
logging. My ideas:

1. {{getContextForGroupParent()}} - log if {{groupQueue}} is not found
2. {{getPlacementContextWithParent()}} - log if {{parent}} is null, this should 
be at least a warning.
3. Under the comment "if the queue doesn't exit we return null" - log if 
{{queue}} is null
4. {{getPlacementContextNoParent()}} - log if {{queue}} is null
5. I can see extra messages in {{getPlacementForUser}} potentially useful. For 
example, before each {{return}} statement, we could log stuff like:
{noformat}
} else if (mapping.getQueue().equals(CURRENT_USER_MAPPING)) {
   LOG.debug("Creating placement context based on current-user 
mapping");
return getPlacementContext(mapping, user);
  } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
   LOG.debug("Creating placement context based on primary-group 
mapping");
   return getPlacementContext(mapping, getPrimaryGroup(user));
{noformat}

I think it's OK to have them on DEBUG level, with the exception of #2. But to 
me, even INFO sounds reasonable. This class has been changed substantially in 
the past months (15 commits since 2019 Oct), I'd feel safer with extra 
printouts. 


was (Author: pbacsko):
Thanks for the latest patch [~shuzirra] I don't really complaint other than 
logging.

I do believe that we need to extend the current code and this patch with more 
logging. My ideas:

1. {{getContextForGroupParent()}} - log if {{groupQueue}} is not found
2. {{getPlacementContextWithParent()}} - log if {{parent}} is null, this should 
be at least a warning.
3. Under the comment "if the queue doesn't exit we return null" - log if 
{{queue}} is null
4. {{getPlacementContextNoParent()}} - log if {{queue}} is null
5. I can see extra messages in {{getPlacementForUser}} potentially useful. For 
example, before each {{return}} statement, we could log stuff like:
{noformat}
} else if (mapping.getQueue().equals(CURRENT_USER_MAPPING)) {
   LOG.debug("Creating placement context based on current-user 
mapping");
return getPlacementContext(mapping, user);
  } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
   LOG.debug("Creating placement context based on primary-group 
mapping");
   return getPlacementContext(mapping, getPrimaryGroup(user));
{noformat}

I think it's OK to have them on DEBUG level, with the exception of #2. But to 
me, even INFO sounds reasonable. This class has been changed substantially in 
the past months (15 commits since 2019 Oct), I'd feel safer with extra 
printouts. 

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch, 
> YARN-10254.003.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106252#comment-17106252
 ] 

Peter Bacsko commented on YARN-10254:
-

Thanks for the latest patch [~shuzirra] I don't really complaint other than 
logging.

I do believe that we need to extend the current code and this patch with more 
logging. My ideas:

1. {{getContextForGroupParent()}} - log if {{groupQueue}} is not found
2. {{getPlacementContextWithParent()}} - log if {{parent}} is null, this should 
be at least a warning.
3. Under the comment "if the queue doesn't exit we return null" - log if 
{{queue}} is null
4. {{getPlacementContextNoParent()}} - log if {{queue}} is null
5. I can see extra messages in {{getPlacementForUser}} potentially useful. For 
example, before each {{return}} statement, we could log stuff like:
{noformat}
} else if (mapping.getQueue().equals(CURRENT_USER_MAPPING)) {
   LOG.debug("Creating placement context based on current-user 
mapping");
return getPlacementContext(mapping, user);
  } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
   LOG.debug("Creating placement context based on primary-group 
mapping");
   return getPlacementContext(mapping, getPrimaryGroup(user));
{noformat}

I think it's OK to have them on DEBUG level, with the exception of #2. But to 
me, even INFO sounds reasonable. This class has been changed substantially in 
the past months (15 commits since 2019 Oct), I'd feel safer with extra 
printouts. 

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch, 
> YARN-10254.003.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-05-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106208#comment-17106208
 ] 

Peter Bacsko commented on YARN-10108:
-

Thanks for the patch [~shuzirra].

Just one comment - this part from the new testcase can be eliminated completely:

{noformat}
  // submit an app
  submitApp(mockRM, cs.getQueue(PARENT_QUEUE), USER0, USER0, 1, 1);

  // check preconditions
  List appsInC = cs.getAppsInQueue(PARENT_QUEUE);
  assertEquals(1, appsInC.size());
  assertNotNull(cs.getQueue(USER0));

  AutoCreatedLeafQueue autoCreatedLeafQueue =
  (AutoCreatedLeafQueue) cs.getQueue(USER0);
  ManagedParentQueue parentQueue = (ManagedParentQueue) cs.getQueue(
  PARENT_QUEUE);
  assertEquals(parentQueue, autoCreatedLeafQueue.getParent());

  Map expectedChildQueueAbsCapacity =
  populateExpectedAbsCapacityByLabelForParentQueue(1);
  validateInitialQueueEntitlement(parentQueue, USER0,
  expectedChildQueueAbsCapacity, accessibleNodeLabelsOnC);

  validateUserAndAppLimits(autoCreatedLeafQueue, 1000, 1000);
  validateContainerLimits(autoCreatedLeafQueue);

  assertTrue(autoCreatedLeafQueue
  .getOrderingPolicy() instanceof FairOrderingPolicy);
{noformat}

Other than that LGTM +1 (non-binding).

> FS-CS converter: nestedUserQueue with default rule results in invalid queue 
> mapping
> ---
>
> Key: YARN-10108
> URL: https://issues.apache.org/jira/browse/YARN-10108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10108.001.patch, YARN-10108.002.patch
>
>
> FS Queue Placement Policy
> {code:java}
> 
> 
> 
> 
> 
>  {code}
> gets mapped to an invalid CS queue mapping "u:%user:root.users.%user"
> RM fails to start with above queue mapping in CS
> {code:java}
> 2020-01-28 00:19:12,889 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue [%user] and invalid parent queue 
> [root.users]
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:829)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1247)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
> [%user] and invalid parent queue [root.users]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:48)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:712)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:753)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:361)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:426)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 7 more
> {code}
> QueuePlacementConverter#handleNestedRule has to be fixed.
> {code:java}
> else if (pr instanceof

[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106183#comment-17106183
 ] 

Peter Bacsko commented on YARN-9930:


[~cane] any updates here? Do you have a patch?

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10158) FS-CS converter: convert property yarn.scheduler.fair.update-interval-ms

2020-05-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YARN-10158.
-
Resolution: Won't Do

> FS-CS converter: convert property yarn.scheduler.fair.update-interval-ms
> 
>
> Key: YARN-10158
> URL: https://issues.apache.org/jira/browse/YARN-10158
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10158) FS-CS converter: convert property yarn.scheduler.fair.update-interval-ms

2020-05-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106045#comment-17106045
 ] 

Peter Bacsko commented on YARN-10158:
-

As discussed offline with [~leftnoteasy], it's a very low level property. 
Preemption at this level works differently in FS and CS so it's fine to ignore 
the conversion of such settings.

> FS-CS converter: convert property yarn.scheduler.fair.update-interval-ms
> 
>
> Key: YARN-10158
> URL: https://issues.apache.org/jira/browse/YARN-10158
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-08 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102597#comment-17102597
 ] 

Peter Bacsko edited comment on YARN-10254 at 5/8/20, 1:42 PM:
--

Regarding #2 I'm voting for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect SLAs and job running times. The users have to realize 
that their application have been placed into the wrong queue and it takes time. 
These kind of problems can be enormously frustrating.

A counter-argument could be that a slower and completed application is still 
better than having no running application at all, especially if the jobs are 
scheduled. 

[~sunilg] thoughts? 


was (Author: pbacsko):
Regarding #2 I'm voting for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect SLAs and job running times. The users have to realize 
that their application have been placed into the wrong queue and it takes time. 
These kind of problems can be enormously frustrating.

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-08 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102597#comment-17102597
 ] 

Peter Bacsko edited comment on YARN-10254 at 5/8/20, 1:39 PM:
--

Regarding #2 I'm voting for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect SLAs and job running times. The users have to realize 
that their application have been placed into the wrong queue and it takes time. 
These kind of problems can be enormously frustrating.


was (Author: pbacsko):
Regarding #2 I'm voting for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect the SLAs and job running times. The users have to 
realize that their application have been placed into the wrong queue and it 
takes time. These kind of problems can be enormously frustrating.

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-08 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102597#comment-17102597
 ] 

Peter Bacsko commented on YARN-10254:
-

Regarding #2 I would vot for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect the SLAs and job running times. The users have to 
realize that their application have been placed into the wrong queue and it 
takes time. These kind of problems can be enormously frustrating.

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-08 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102597#comment-17102597
 ] 

Peter Bacsko edited comment on YARN-10254 at 5/8/20, 1:38 PM:
--

Regarding #2 I'm voting for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect the SLAs and job running times. The users have to 
realize that their application have been placed into the wrong queue and it 
takes time. These kind of problems can be enormously frustrating.


was (Author: pbacsko):
Regarding #2 I would vot for throwing an exception.

For example, I use the mapping "u:%user:root.grp.%primary_group", but the 
administrator changes the queue structure to "root.groups" from "root.grp". The 
mapping becomes invalid and all (or some) applications end up running in 
"root.default". In situatons like that, I prefer seeing a failure immediately 
so I can act ASAP. Having a running app might be preferable in some scenarios, 
but it will likely affect the SLAs and job running times. The users have to 
realize that their application have been placed into the wrong queue and it 
takes time. These kind of problems can be enormously frustrating.

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10254) CapacityScheduler incorrect User Group Mapping after leaf queue change

2020-05-07 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101749#comment-17101749
 ] 

Peter Bacsko commented on YARN-10254:
-

Thanks for the fix [~shuzirra]. Some comments:

#1
{noformat}
  private ApplicationPlacementContext getContextForGroup(
  String group,
  QueueMapping mapping) throws IOException {
return getPlacementContext(mapping, group);
  }
{noformat}
This method is so tiny, do we need this? It's called twice from 
{{getPlacementForUser()}}.

#2
{noformat}
  private ApplicationPlacementContext getPlacementContextWithParent(
  QueueMapping mapping,
  String leafQueueName) {
CSQueue parent = queueManager.getQueue(mapping.getParentQueue());
//we don't find the specified parent, so the placement rule is invalid
//for this case
if (parent == null) {
  return null;
}
{noformat}
Here, if the parent {{CSQueue}} object doesn't exist, we return null. AFAIK the 
application will be placed into "root.default" then. Shouldn't we throw an 
exception instead?

#3
 {{if ( groupQueue != null) {}}
 Nit: unnecessary whitespace

#4
For readability purposes, the method {{private ApplicationPlacementContext 
getPlacementContext(QueueMapping mapping,  String leafQueueName) throws 
IOException}} should be placed above {{getPlacementContextNoParent()}} and 
{{getPlacementContextWithParent()}}.

#5
This part of the code is interesting inside {{getPlacementContextWithParent()}}:
{noformat}
if (!(parent instanceof ManagedParentQueue)) {
  CSQueue queue = queueManager.getQueue(
  mapping.getParentQueue() + "." + leafQueueName);
{noformat}

No matter how we define the parent in the mapping ("users.%primary_group" vs 
"root.users.%primary_group"), we always rely on {{mapping.getParentQueue()}}. 
But this could be either "users" or "root.users". I have a feeling that a 
normalization step with {{alterMapping()}} is necessary (that step is only used 
inside {{getContextForGroupParent()}}).

#6
The method name {{alterMapping()}} is not expressive enough, what about 
{{normalizeMapping()}} or {{resolveMapping()}}, with a short comment like 
"Creates a new mapping from the original by replacing certain values that were 
not known before the original was evaluated, like username or the full path of 
the parent". Just an idea.
 

> CapacityScheduler incorrect User Group Mapping after leaf queue change
> --
>
> Key: YARN-10254
> URL: https://issues.apache.org/jira/browse/YARN-10254
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10254.001.patch, YARN-10254.002.patch
>
>
> YARN-9879 and YARN-10198 introduced some major changes to user group mapping, 
> and some of them unfortunately had some negative impact on the way mapping 
> works.
> In some cases incorrect PlacementContexts were created, where full queue path 
> was passed as leaf queue name. This affects how the yarn cli app list 
> displays the queues.
> u:%user:%primary_group.%user mapping fails with an incorrect validation error 
> when the %primary_group parent queue was a managed parent.
> Group based rules in certain cases are mapped to root.[primary_group] rules, 
> loosing the ability to create deeper structures.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-05 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099767#comment-17099767
 ] 

Peter Bacsko commented on YARN-10257:
-

[~adam.antal]

1. "Do you plan to do something with regards to the first item from the 
description? (I mean the increment-allocation properties)."

Well, those properties are not needed during the conversion - so I simply 
removed all references to them (the original FS-CS document made me believe 
that those are necessary to convert, but in fact, they're not).

2. system-rules artifact --> thanks, it looks like a better solution, I 
switched to this.

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10257-001.patch, YARN-10257-002.patch
>
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-05 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10257:

Attachment: YARN-10257-002.patch

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10257-001.patch, YARN-10257-002.patch
>
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099218#comment-17099218
 ] 

Peter Bacsko commented on YARN-10257:
-

Note: I modified how we call {{System.exit()}} from the tests. The previous 
method wasn't good at all.

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10257-001.patch
>
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099216#comment-17099216
 ] 

Peter Bacsko commented on YARN-10257:
-

Checkstyle is not relevant. [~snemeth] please review.

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10257-001.patch
>
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10257:

Attachment: YARN-10257-001.patch

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10257-001.patch
>
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10257:

Description: 
Two issues have been discovered during fs2cs testing:

1. The conversion of allocation increment properties are not needed:

{{yarn.scheduler.increment-allocation-mb}}
{{yarn.scheduler.increment-allocation-vcores}}
{{yarn.resource-types.memory-mb.increment-allocation}}
{{yarn.resource-types.vcores.increment-allocation}}

2. The following piece of code is incorrect - the default scheduling policy can 
be different from DRF, which is a problem if DRF is used everywhere else:

{code}
  private boolean isDrfUsed(FairScheduler fs) {
FSQueue rootQueue = fs.getQueueManager().getRootQueue();
AllocationConfiguration allocConf = fs.getAllocationConfiguration();

String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();

if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
  return true;
} else {
  return isDrfUsedOnQueueLevel(rootQueue);
}
  }
{code}

  was:
Two issues have been discovered during fs2cs testing:

1. The value of two properties are not checked:

{{yarn.scheduler.increment-allocation-mb}}
{{yarn.scheduler.increment-allocation-vcores}}

Although these two are marked as deprecated, they're still in use and must be 
handled.

2. The following piece of code is incorrect - the default scheduling policy can 
be different from DRF, which is a problem if DRF is used everywhere else:

{code}
  private boolean isDrfUsed(FairScheduler fs) {
FSQueue rootQueue = fs.getQueueManager().getRootQueue();
AllocationConfiguration allocConf = fs.getAllocationConfiguration();

String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();

if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
  return true;
} else {
  return isDrfUsedOnQueueLevel(rootQueue);
}
  }
{code}


> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Two issues have been discovered during fs2cs testing:
> 1. The conversion of allocation increment properties are not needed:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> {{yarn.resource-types.memory-mb.increment-allocation}}
> {{yarn.resource-types.vcores.increment-allocation}}
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10257) FS-CS converter: skip increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10257:

Summary: FS-CS converter: skip increment properties for mem/vcores and fix 
DRF check  (was: FS-CS converter: check deprecated increment properties for 
mem/vcores and fix DRF check)

> FS-CS converter: skip increment properties for mem/vcores and fix DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Two issues have been discovered during fs2cs testing:
> 1. The value of two properties are not checked:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> Although these two are marked as deprecated, they're still in use and must be 
> handled.
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10257) FS-CS converter: check deprecated increment properties for mem/vcores and fix DRF check

2020-05-04 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10257:

Description: 
Two issues have been discovered during fs2cs testing:

1. The value of two properties are not checked:

{{yarn.scheduler.increment-allocation-mb}}
{{yarn.scheduler.increment-allocation-vcores}}

Although these two are marked as deprecated, they're still in use and must be 
handled.

2. The following piece of code is incorrect - the default scheduling policy can 
be different from DRF, which is a problem if DRF is used everywhere else:

{code}
  private boolean isDrfUsed(FairScheduler fs) {
FSQueue rootQueue = fs.getQueueManager().getRootQueue();
AllocationConfiguration allocConf = fs.getAllocationConfiguration();

String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();

if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
  return true;
} else {
  return isDrfUsedOnQueueLevel(rootQueue);
}
  }
{code}

  was:
Two issues have been discovered during fs2cs testing:

1. The value of two properties are not checked:

{{yarn.scheduler.increment-allocation-mb}}
{{yarn.scheduler.increment-allocation-vcores}}

Although these two are marked as deprecated, they're still in use and must be 
handled.

2. The following piece of code is incorrect - the default scheduling policy can 
be different fromDRF, which is a problem is DRF is used everywhere:

{code}
  private boolean isDrfUsed(FairScheduler fs) {
FSQueue rootQueue = fs.getQueueManager().getRootQueue();
AllocationConfiguration allocConf = fs.getAllocationConfiguration();

String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();

if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
  return true;
} else {
  return isDrfUsedOnQueueLevel(rootQueue);
}
  }
{code}


> FS-CS converter: check deprecated increment properties for mem/vcores and fix 
> DRF check
> ---
>
> Key: YARN-10257
> URL: https://issues.apache.org/jira/browse/YARN-10257
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Two issues have been discovered during fs2cs testing:
> 1. The value of two properties are not checked:
> {{yarn.scheduler.increment-allocation-mb}}
> {{yarn.scheduler.increment-allocation-vcores}}
> Although these two are marked as deprecated, they're still in use and must be 
> handled.
> 2. The following piece of code is incorrect - the default scheduling policy 
> can be different from DRF, which is a problem if DRF is used everywhere else:
> {code}
>   private boolean isDrfUsed(FairScheduler fs) {
> FSQueue rootQueue = fs.getQueueManager().getRootQueue();
> AllocationConfiguration allocConf = fs.getAllocationConfiguration();
> String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();
> if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
>   return true;
> } else {
>   return isDrfUsedOnQueueLevel(rootQueue);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10257) FS-CS converter: check deprecated increment properties for mem/vcores and fix DRF check

2020-05-03 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10257:
---

 Summary: FS-CS converter: check deprecated increment properties 
for mem/vcores and fix DRF check
 Key: YARN-10257
 URL: https://issues.apache.org/jira/browse/YARN-10257
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Two issues have been discovered during fs2cs testing:

1. The value of two properties are not checked:

{{yarn.scheduler.increment-allocation-mb}}
{{yarn.scheduler.increment-allocation-vcores}}

Although these two are marked as deprecated, they're still in use and must be 
handled.

2. The following piece of code is incorrect - the default scheduling policy can 
be different fromDRF, which is a problem is DRF is used everywhere:

{code}
  private boolean isDrfUsed(FairScheduler fs) {
FSQueue rootQueue = fs.getQueueManager().getRootQueue();
AllocationConfiguration allocConf = fs.getAllocationConfiguration();

String defaultPolicy = allocConf.getDefaultSchedulingPolicy().getName();

if (DominantResourceFairnessPolicy.NAME.equals(defaultPolicy)) {
  return true;
} else {
  return isDrfUsedOnQueueLevel(rootQueue);
}
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-04-28 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reassigned YARN-10108:
---

Assignee: Gergely Pollak  (was: Peter Bacsko)

> FS-CS converter: nestedUserQueue with default rule results in invalid queue 
> mapping
> ---
>
> Key: YARN-10108
> URL: https://issues.apache.org/jira/browse/YARN-10108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
>
> FS Queue Placement Policy
> {code:java}
> 
> 
> 
> 
> 
>  {code}
> gets mapped to an invalid CS queue mapping "u:%user:root.users.%user"
> RM fails to start with above queue mapping in CS
> {code:java}
> 2020-01-28 00:19:12,889 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue [%user] and invalid parent queue 
> [root.users]
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:829)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1247)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
> [%user] and invalid parent queue [root.users]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:48)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:712)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:753)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:361)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:426)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 7 more
> {code}
> QueuePlacementConverter#handleNestedRule has to be fixed.
> {code:java}
> else if (pr instanceof DefaultPlacementRule) {
>   DefaultPlacementRule defaultRule = (DefaultPlacementRule) pr;
>   mapping.append("u:" + USER + ":")
> .append(defaultRule.defaultQueueName)
> .append("." + USER);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10108) FS-CS converter: nestedUserQueue with default rule results in invalid queue mapping

2020-04-24 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091757#comment-17091757
 ] 

Peter Bacsko commented on YARN-10108:
-

Update: after YARN-9879, the issue still persists. Will have a discussion with 
[~shuzirra] and [~snemeth] about the details.

> FS-CS converter: nestedUserQueue with default rule results in invalid queue 
> mapping
> ---
>
> Key: YARN-10108
> URL: https://issues.apache.org/jira/browse/YARN-10108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
>
> FS Queue Placement Policy
> {code:java}
> 
> 
> 
> 
> 
>  {code}
> gets mapped to an invalid CS queue mapping "u:%user:root.users.%user"
> RM fails to start with above queue mapping in CS
> {code:java}
> 2020-01-28 00:19:12,889 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: mapping 
> contains invalid or non-leaf queue [%user] and invalid parent queue 
> [root.users]
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:829)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1247)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:324)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
> Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
> [%user] and invalid parent queue [root.users]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:48)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:712)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:753)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:361)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:426)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   ... 7 more
> {code}
> QueuePlacementConverter#handleNestedRule has to be fixed.
> {code:java}
> else if (pr instanceof DefaultPlacementRule) {
>   DefaultPlacementRule defaultRule = (DefaultPlacementRule) pr;
>   mapping.append("u:" + USER + ":")
> .append(defaultRule.defaultQueueName)
> .append("." + USER);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser

2020-04-23 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090622#comment-17090622
 ] 

Peter Bacsko commented on YARN-10199:
-

[~gandras] in the meantime, YARN-10226 went in because there was a bug in the 
code.

Please rebase your patch to that code because it doesn't seem to contain it.

> Simplify UserGroupMappingPlacementRule#getPlacementForUser
> --
>
> Key: YARN-10199
> URL: https://issues.apache.org/jira/browse/YARN-10199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10199.001.patch, YARN-10199.002.patch, 
> YARN-10199.003.patch, YARN-10199.004.patch, YARN-10199.005.patch
>
>
> The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly 
> responsible for queue naming, contains deeply nested branches. In order to 
> provide an extendable mapping logic, the branches could be flattened and 
> simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-04-16 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084989#comment-17084989
 ] 

Peter Bacsko edited comment on YARN-10102 at 4/16/20, 3:33 PM:
---

So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and group 
("g"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to be looked up.
 # Submitted queue string is "root.admins.alice". We know immediately that the 
parent is "root.admins".

After we retrieved the parent, there are still two possibilities:
 # Parent is managed parent (instance of {{ManagedParentQueue}}). In this case, 
just return {{getPlacementContext(mapping, queueName)}} because the queue will 
be created if it doesn't exist.
 # Parent is not managed. In this case, you have to check if the full path 
actually exists. If it does, return {{getPlacementContext(mapping, queueName)}} 
otherwise return "null" because the queue cannot be created.
 

So I've been thinking somethin like this (if the mapping type is "user":
{noformat}
  
 // Need to pass queue from ApplicationSubmissionContext, see 
getPlacementForApp()
 private ApplicationPlacementContext getPlacementForUser(String user, 
String targetQueue)
   [...]
  } else if (mapping.getQueue().equals(SECONDARY_GROUP_MAPPING)) {
return getContextForSecondaryGroup(user, mapping);
  } else if (mapping.getQueue().equals(SPECIFIED_MAPPING)) {  <-- new 
mapping
return getContextForSpecified(targetQueue, mapping);
  } else {
return getPlacementContext(mapping);
  }
 [...]
  
   private ApplicationPlacementContext getContextForSpecified(String 
targetQueue,
QueueMapping mapping) throws IOException {

  String parentQueueStr = null;
  CSQueue csParentQueue = null;
  CSQueue csTargetQueue = null;

if (targetQueue.startsWith("root")) {
// full path
parentQueueStr = getParentFromString(targetQueue); // implement this
csParentQueue = queueManager.getQueue(parentQueueStr);
csTargetQueue = queueManager.getQueue(targetQueue);
} else {
parentQueueStr = getParentFromLeafName(targetQueue);  // implement 
this
csParentQueue = queueManager.getQueue(parentQueueStr);
// this method should work for short name too
csTargetQueue = queueManager.getQueue(targetQueue); 
}

// ManagedParent, just return whatever defined in the submission context
if (csParentQueue instanceof ManagedParent) {
getPlacementContext(mapping, targetQueue); 
} else {
// Otherwise we have to make sure that it exists
if (csTargetQueue != null) {
getPlacementContext(mapping, targetQueue); 
} else {
// Queue doesn't exist and cannot be created
return null;
}
 }
  }
{noformat}

I haven't tested this at all, but *in theory* this is what we need. It's a bit 
more complicated but I believe this is the correct approach.

cc [~prabhujoseph] [~maniraj...@gmail.com]

 


was (Author: pbacsko):
So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and group 
("g"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to

[jira] [Comment Edited] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-04-16 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084989#comment-17084989
 ] 

Peter Bacsko edited comment on YARN-10102 at 4/16/20, 3:32 PM:
---

So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and group 
("g"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to be looked up.
 # Submitted queue string is "root.admins.alice". We know immediately that the 
parent is "root.admins".

After we retrieved the parent, there are still two possibilities:
 # Parent is managed parent (instance of {{ManagedParentQueue}}). In this case, 
just return {{getPlacementContext(mapping, queueName)}} because the queue will 
be created if it doesn't exist.
 # Parent is not managed. In this case, you have to check if the full path 
actually exists. If it does, return {{getPlacementContext(mapping, queueName)}} 
otherwise return "null" because the queue cannot be created.
 

So I've been thinking somethin like this (if the mapping type is "user":
{noformat}
  
 // Need to pass queue from ApplicationSubmissionContext, see 
getPlacementForApp()
 private ApplicationPlacementContext getPlacementForUser(String user, 
String targetQueue)
   [...]
  } else if (mapping.getQueue().equals(SECONDARY_GROUP_MAPPING)) {
return getContextForSecondaryGroup(user, mapping);
  } else if (mapping.getQueue().equals(SPECIFIED_MAPPING)) {  <-- new 
mapping
return getContextForSpecified(targetQueue, mapping);
  } else {
return getPlacementContext(mapping);
  }
 [...]
  
   private ApplicationPlacementContext getContextForSpecified(String 
targetQueue,
QueueMapping mapping) throws IOException {

  String parentQueueStr = null;
  CSQueue csParentQueue = null;
  CSQueue csTargetQueue = null;

if (targetQueue.startsWith("root")) {
// full path
parentQueueStr = getParentFromString(targetQueue); // implement this
csParentQueue = queueManager.getQueue(parentQueueStr );
csTargetQueue = queueManager.getQueue(targetQueue);
} else {
parentQueueStr = getParentFromLeafName(targetQueue);  // implement 
this
csParentQueue = queueManager.getQueue(parentQueueStr );
// this method should work for short name too
csTargetQueue = queueManager.getQueue(targetQueue); 
}

// ManagedParent, just return whatever defined in the submission context
if (parentQueue instanceof ManagedParent) {
getPlacementContext(mapping, targetQueue); 
} else {
// Otherwise we have to make sure that it exists
if (csTargetQueue != null) {
getPlacementContext(mapping, targetQueue); 
} else {
// Queue doesn't exist and cannot be created
return null;
}
 }
  }
{noformat}

I haven't tested this at all, but *in theory* this is what we need. It's a bit 
more complicated but I believe this is the correct approach.

cc [~prabhujoseph] [~maniraj...@gmail.com]

 


was (Author: pbacsko):
So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and group 
("g"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to

[jira] [Comment Edited] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-04-16 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084989#comment-17084989
 ] 

Peter Bacsko edited comment on YARN-10102 at 4/16/20, 3:31 PM:
---

So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and group 
("g"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to be looked up.
 # Submitted queue string is "root.admins.alice". We know immediately that the 
parent is "root.admins".

After we retrieved the parent, there are still two possibilities:
 # Parent is managed parent (instance of {{ManagedParentQueue}}). In this case, 
just return {{getPlacementContext(mapping, queueName)}} because the queue will 
be created if it doesn't exist.
 # Parent is not managed. In this case, you have to check if the full path 
actually exists. If it does, return {{getPlacementContext(mapping, queueName)}} 
otherwise return "null" because the queue cannot be created.
 

So I've been thinking somethin like this (if the mapping type is "user":
{noformat}
  
 // Need to pass queue from ApplicationSubmissionContext, see 
getPlacementForApp()
 private ApplicationPlacementContext getPlacementForUser(String user, 
String targetQueue)
   [...]
  } else if (mapping.getQueue().equals(SECONDARY_GROUP_MAPPING)) {
return getContextForSecondaryGroup(user, mapping);
  } else if (mapping.getQueue().equals(SPECIFIED_MAPPING)) {  <-- new 
mapping
return getContextForSpecified(targetQueue, mappin);
  } else {
return getPlacementContext(mapping);
  }
 [...]
  
   private ApplicationPlacementContext getContextForSpecified(String 
targetQueue,
QueueMapping mapping) throws IOException {

  String parentQueueStr = null;
  CSQueue csParentQueue = null;
  CSQueue csTargetQueue = null;

if (targetQueue.startsWith("root")) {
// full path
parentQueueStr = getParentFromString(targetQueue); // implement this
csParentQueue = queueManager.getQueue(parentQueueStr );
csTargetQueue = queueManager.getQueue(targetQueue);
} else {
parentQueueStr = getParentFromLeafName(targetQueue);  // implement 
this
csParentQueue = queueManager.getQueue(parentQueueStr );
// this method should work for short name too
csTargetQueue = queueManager.getQueue(targetQueue); 
}

// ManagedParent, just return whatever defined in the submission context
if (parentQueue instanceof ManagedParent) {
getPlacementContext(mapping, targetQueue); 
} else {
// Otherwise we have to make sure that it exists
if (csTargetQueue != null) {
getPlacementContext(mapping, targetQueue); 
} else {
// Queue doesn't exist and cannot be created
return null;
}
 }
  }
{noformat}

I haven't tested this at all, but *in theory* this is what we need. It's a bit 
more complicated but I believe this is the correct approach.

cc [~prabhujoseph] [~maniraj...@gmail.com]

 


was (Author: pbacsko):
So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and 
("group"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to be

[jira] [Commented] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-04-16 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084989#comment-17084989
 ] 

Peter Bacsko commented on YARN-10102:
-

So, did review the patch, some comments.

1. I can see that this is a new kind of mapping besides user ("u") and 
("group"). It would be more flexible if we had it like this:

{{u:%user%:%specified}}
{{g:group1:%specified}}

So instead of introducing a new mapping type, I'd prefer to have this as a 
placeholder.

2. Recently a large enhancement has been submitted to trunk (YARN-9879), that 
is, you can use multiple leaf queues with the same name (eg. "root.users.alice" 
and "root.admins.alice" are both valid, which was not the case before).

But there's also backward compatibility, so you can still reference a queue 
with its leaf name only (as long as it's unique). Also, queues have parents, 
which can be normal parent queues or managed parents.

This brings to the following scenarios:
 # Submitted queue string "alice". Parent is not known it has to be looked up.
 # Submitted queue string is "root.admins.alice". We know immediately that the 
parent is "root.admins".

After we retrieved the parent, there are still two possibilities:
 # Parent is managed parent (instance of {{ManagedParentQueue}}). In this case, 
just return {{getPlacementContext(mapping, queueName)}} because the queue will 
be created if it doesn't exist.
 # Parent is not managed. In this case, you have to check if the full path 
actually exists. If it does, return {{getPlacementContext(mapping, queueName)}} 
otherwise return "null" because the queue cannot be created.
 

So I've been thinking somethin like this (if the mapping type is "user":
{noformat}
  
 // Need to pass queue from ApplicationSubmissionContext, see 
getPlacementForApp()
 private ApplicationPlacementContext getPlacementForUser(String user, 
String targetQueue)
   [...]
  } else if (mapping.getQueue().equals(SECONDARY_GROUP_MAPPING)) {
return getContextForSecondaryGroup(user, mapping);
  } else if (mapping.getQueue().equals(SPECIFIED_MAPPING)) {  <-- new 
mapping
return getContextForSpecified(targetQueue, mappin);
  } else {
return getPlacementContext(mapping);
  }
 [...]
  
   private ApplicationPlacementContext getContextForSpecified(String 
targetQueue,
QueueMapping mapping) throws IOException {

  String parentQueueStr = null;
  CSQueue csParentQueue = null;
  CSQueue csTargetQueue = null;

if (targetQueue.startsWith("root")) {
// full path
parentQueueStr = getParentFromString(targetQueue); // implement this
csParentQueue = queueManager.getQueue(parentQueueStr );
csTargetQueue = queueManager.getQueue(targetQueue);
} else {
parentQueueStr = getParentFromLeafName(targetQueue);  // implement 
this
csParentQueue = queueManager.getQueue(parentQueueStr );
// this method should work for short name too
csTargetQueue = queueManager.getQueue(targetQueue); 
}

// ManagedParent, just return whatever defined in the submission context
if (parentQueue instanceof ManagedParent) {
getPlacementContext(mapping, targetQueue); 
} else {
// Otherwise we have to make sure that it exists
if (csTargetQueue != null) {
getPlacementContext(mapping, targetQueue); 
} else {
// Queue doesn't exist and cannot be created
return null;
}
 }
  }
{noformat}

I haven't tested this at all, but *in theory* this is what we need. It's a bit 
more complicated but I believe this is the correct approach.

cc [~prabhujoseph] [~maniraj...@gmail.com]

 

> Capacity scheduler: add support for %specified mapping
> --
>
> Key: YARN-10102
> URL: https://issues.apache.org/jira/browse/YARN-10102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10102-001.patch
>
>
> The reduce the gap between Fair Scheduler and Capacity Scheduler, it's 
> reasonable to have a {{%specified}} mapping. This would be equivalent to the 
> {{}}  placement rule in FS, that is, use the queue that comes in 
> with the application submission context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Commented] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-04-15 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084080#comment-17084080
 ] 

Peter Bacsko commented on YARN-10102:
-

Thanks for the patch [~tanu.ajmera] I'll review it as soon as I have some time.

> Capacity scheduler: add support for %specified mapping
> --
>
> Key: YARN-10102
> URL: https://issues.apache.org/jira/browse/YARN-10102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10102-001.patch
>
>
> The reduce the gap between Fair Scheduler and Capacity Scheduler, it's 
> reasonable to have a {{%specified}} mapping. This would be equivalent to the 
> {{}}  placement rule in FS, that is, use the queue that comes in 
> with the application submission context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10234) FS-CS converter: don't enable auto-create queue property for root

2020-04-14 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083594#comment-17083594
 ] 

Peter Bacsko commented on YARN-10234:
-

[~sunilg] [~snemeth] please review & commit.

> FS-CS converter: don't enable auto-create queue property for root
> -
>
> Key: YARN-10234
> URL: https://issues.apache.org/jira/browse/YARN-10234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10234-001.patch, YARN-10234-002.patch
>
>
> The auto-create-child-queue property should not be enabled for root, 
> otherwise it creates an exception inside capacity scheduler.
> {noformat}
> 2020-04-14 09:48:54,117 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2020-04-14 09:48:54,117 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED, caused by failure to 
> refresh configuration settings: org.apache.hadoop.ha.ServiceFailedException: 
> RefreshAll operation failed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:772)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: java.io.IOException: Failed to re-init queues : null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:467)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:489)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:761)
> ... 6 more
> Caused by: java.lang.ClassCastException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10234) FS-CS converter: don't enable auto-create queue property for root

2020-04-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10234:

Attachment: YARN-10234-002.patch

> FS-CS converter: don't enable auto-create queue property for root
> -
>
> Key: YARN-10234
> URL: https://issues.apache.org/jira/browse/YARN-10234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10234-001.patch, YARN-10234-002.patch
>
>
> The auto-create-child-queue property should not be enabled for root, 
> otherwise it creates an exception inside capacity scheduler.
> {noformat}
> 2020-04-14 09:48:54,117 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2020-04-14 09:48:54,117 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED, caused by failure to 
> refresh configuration settings: org.apache.hadoop.ha.ServiceFailedException: 
> RefreshAll operation failed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:772)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: java.io.IOException: Failed to re-init queues : null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:467)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:489)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:761)
> ... 6 more
> Caused by: java.lang.ClassCastException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10226) NPE in Capacity Scheduler while using %primary_group queue mapping

2020-04-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10226:

Attachment: YARN-10234-002.patch

> NPE in Capacity Scheduler while using %primary_group queue mapping
> --
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10226) NPE in Capacity Scheduler while using %primary_group queue mapping

2020-04-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10226:

Attachment: (was: YARN-10234-002.patch)

> NPE in Capacity Scheduler while using %primary_group queue mapping
> --
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Fix For: 3.3.0, 3.4.0
>
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10234) FS-CS converter: don't enable auto-create queue property for root

2020-04-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10234:

Attachment: YARN-10234-001.patch

> FS-CS converter: don't enable auto-create queue property for root
> -
>
> Key: YARN-10234
> URL: https://issues.apache.org/jira/browse/YARN-10234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10234-001.patch
>
>
> The auto-create-child-queue property should not be enabled for root, 
> otherwise it creates an exception inside capacity scheduler.
> {noformat}
> 2020-04-14 09:48:54,117 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2020-04-14 09:48:54,117 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED, caused by failure to 
> refresh configuration settings: org.apache.hadoop.ha.ServiceFailedException: 
> RefreshAll operation failed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:772)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: java.io.IOException: Failed to re-init queues : null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:467)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:489)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:761)
> ... 6 more
> Caused by: java.lang.ClassCastException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10234) FS-CS converter: don't enable auto-create queue property for root

2020-04-14 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10234:

Summary: FS-CS converter: don't enable auto-create queue property for root  
(was: FS-CS converter: don't enale auto-create queue property for root)

> FS-CS converter: don't enable auto-create queue property for root
> -
>
> Key: YARN-10234
> URL: https://issues.apache.org/jira/browse/YARN-10234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>
> The auto-create-child-queue property should not be enabled for root, 
> otherwise it creates an exception inside capacity scheduler.
> {noformat}
> 2020-04-14 09:48:54,117 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Trying to re-establish ZK session
> 2020-04-14 09:48:54,117 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED, caused by failure to 
> refresh configuration settings: org.apache.hadoop.ha.ServiceFailedException: 
> RefreshAll operation failed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:772)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: java.io.IOException: Failed to re-init queues : null
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:467)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:489)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:761)
> ... 6 more
> Caused by: java.lang.ClassCastException
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10234) FS-CS converter: don't enale auto-create queue property for root

2020-04-14 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10234:
---

 Summary: FS-CS converter: don't enale auto-create queue property 
for root
 Key: YARN-10234
 URL: https://issues.apache.org/jira/browse/YARN-10234
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


The auto-create-child-queue property should not be enabled for root, otherwise 
it creates an exception inside capacity scheduler.

{noformat}
2020-04-14 09:48:54,117 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying 
to re-establish ZK session
2020-04-14 09:48:54,117 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
RMFatalEvent of type TRANSITION_TO_ACTIVE_FAILED, caused by failure to refresh 
configuration settings: org.apache.hadoop.ha.ServiceFailedException: RefreshAll 
operation failed
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:772)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:636)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.io.IOException: Failed to re-init queues : null
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:467)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:489)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:761)
... 6 more
Caused by: java.lang.ClassCastException
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10226) NPE when using %primary_group queue mapping

2020-04-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079057#comment-17079057
 ] 

Peter Bacsko commented on YARN-10226:
-

[~sunilg] please revire & commit this fix, thanks.

> NPE when using %primary_group queue mapping
> ---
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10226) NPE when using %primary_group queue mapping

2020-04-08 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10226:

Attachment: YARN-10226-001.patch

> NPE when using %primary_group queue mapping
> ---
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10226) NPE when using %primary_group queue mapping

2020-04-08 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10226:
---

 Summary: NPE when using %primary_group queue mapping
 Key: YARN-10226
 URL: https://issues.apache.org/jira/browse/YARN-10226
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


If we use the following queue mapping:

{{u:%user:%primary_group}}

then we get a NPE inside ResourceManager:

{noformat}
2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(881)) - Failed to load/recover state
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
at 
org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
...
{noformat}

We to check if parent queue is null in 
{{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-03-27 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068678#comment-17068678
 ] 

Peter Bacsko commented on YARN-10049:
-

Just minor nits:

1. {{if(res == 0)}} whitespace after "if"
2. Use {{Assert.assertEquals()}} everywhere to be consistent with existing code 
OR replace everything with {{assertThat()}}. Current code is a bit all over the 
place, assertj is mixed with junit. A small cleanup woudln't harm.

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser

2020-03-27 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068568#comment-17068568
 ] 

Peter Bacsko edited comment on YARN-10199 at 3/27/20, 11:07 AM:


[~gandras] some minor nits:

1.  in {{getPlacementForUser()}}: {{mapping.getSource()}} - store this in a 
local String to avoid repeated call of this method and increase readability

2. This piece of code:
{noformat}
  private boolean isCurrentUserToGroupParentMapping(QueueMapping 
queueMapping) {
return queueMapping.getSource().equals(CURRENT_USER_MAPPING)
&& queueMapping.getParentQueue() != null
&& (queueMapping.getParentQueue().equals(PRIMARY_GROUP_MAPPING)
|| 
queueMapping.getParentQueue().equals(SECONDARY_GROUP_MAPPING))
&& queueMapping.getQueue().equals(CURRENT_USER_MAPPING);
  }
{noformat}
Is there any way to restructure this? Or maybe add a short comment above it 
which described when it returns true.

3. Change {{this.queueManager}} to just {{queueManager}} (it's mixed right now).

4. Fix checkstyle issues.

I don't see how the readability could enhanced further. I think it would 
require introducing new classes just like in FS, ie. separate classes for every 
type of mapping (CurrentUserMapping, UserMapping, GroupMapping, whatever), 
store them in a list, etc. But I think it's not worth the effort, at least not 
now.



was (Author: pbacsko):
[~gandras] some minor nits:

1.  in {{getPlacementForUser()}}: {{mapping.getSource()}} - store this in a 
local String to avoid repeated call of this method and increase readability

2. This piece of code:
  private boolean isCurrentUserToGroupParentMapping(QueueMapping 
queueMapping) {
return queueMapping.getSource().equals(CURRENT_USER_MAPPING)
&& queueMapping.getParentQueue() != null
&& (queueMapping.getParentQueue().equals(PRIMARY_GROUP_MAPPING)
|| 
queueMapping.getParentQueue().equals(SECONDARY_GROUP_MAPPING))
&& queueMapping.getQueue().equals(CURRENT_USER_MAPPING);
  }
Is there any way to restructure this? Or maybe add a short comment above it 
which described when it returns true.

3. Change {{this.queueManager}} to just {{queueManager}} (it's mixed right now).

4. Fix checkstyle issues.

I don't see how the readability could enhanced further. I think it would 
require introducing new classes just like in FS, ie. separate classes for every 
type of mapping (CurrentUserMapping, UserMapping, GroupMapping, whatever), 
store them in a list, etc. But I think it's not worth the effort, at least not 
now.


> Simplify UserGroupMappingPlacementRule#getPlacementForUser
> --
>
> Key: YARN-10199
> URL: https://issues.apache.org/jira/browse/YARN-10199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10199.001.patch, YARN-10199.002.patch, 
> YARN-10199.003.patch, YARN-10199.004.patch
>
>
> The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly 
> responsible for queue naming, contains deeply nested branches. In order to 
> provide an extendable mapping logic, the branches could be flattened and 
> simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser

2020-03-27 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068568#comment-17068568
 ] 

Peter Bacsko commented on YARN-10199:
-

[~gandras] some minor nits:

1.  in {{getPlacementForUser()}}: {{mapping.getSource()}} - store this in a 
local String to avoid repeated call of this method and increase readability

2. This piece of code:
  private boolean isCurrentUserToGroupParentMapping(QueueMapping 
queueMapping) {
return queueMapping.getSource().equals(CURRENT_USER_MAPPING)
&& queueMapping.getParentQueue() != null
&& (queueMapping.getParentQueue().equals(PRIMARY_GROUP_MAPPING)
|| 
queueMapping.getParentQueue().equals(SECONDARY_GROUP_MAPPING))
&& queueMapping.getQueue().equals(CURRENT_USER_MAPPING);
  }
Is there any way to restructure this? Or maybe add a short comment above it 
which described when it returns true.

3. Change {{this.queueManager}} to just {{queueManager}} (it's mixed right now).

4. Fix checkstyle issues.

I don't see how the readability could enhanced further. I think it would 
require introducing new classes just like in FS, ie. separate classes for every 
type of mapping (CurrentUserMapping, UserMapping, GroupMapping, whatever), 
store them in a list, etc. But I think it's not worth the effort, at least not 
now.


> Simplify UserGroupMappingPlacementRule#getPlacementForUser
> --
>
> Key: YARN-10199
> URL: https://issues.apache.org/jira/browse/YARN-10199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10199.001.patch, YARN-10199.002.patch, 
> YARN-10199.003.patch, YARN-10199.004.patch
>
>
> The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly 
> responsible for queue naming, contains deeply nested branches. In order to 
> provide an extendable mapping logic, the branches could be flattened and 
> simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-23 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064750#comment-17064750
 ] 

Peter Bacsko commented on YARN-10198:
-

Due to infra issues, we can't see the comment about the Precommit builds.

Here's a build about patch v3: 
https://builds.apache.org/job/PreCommit-YARN-Build/25730/console

{noformat}
+1 overall

| Vote |Subsystem |  Runtime   | Comment

|   0  |  reexec  |   0m 48s   | Docker mode activated. 
|  |  || Prechecks 
|  +1  | @author  |   0m  0s   | The patch does not contain any @author 
|  |  || tags.
|  +1  |  test4tests  |   0m  0s   | The patch appears to include 1 new or 
|  |  || modified test files.
|  |  || trunk Compile Tests 
|  +1  |  mvninstall  |  20m 57s   | trunk passed 
|  +1  | compile  |   0m 43s   | trunk passed 
|  +1  |  checkstyle  |   0m 34s   | trunk passed 
|  +1  | mvnsite  |   0m 46s   | trunk passed 
|  +1  |shadedclient  |  15m 27s   | branch has no errors when building and 
|  |  || testing our client artifacts.
|  +1  |findbugs  |   1m 34s   | trunk passed 
|  +1  | javadoc  |   0m 29s   | trunk passed 
|  |  || Patch Compile Tests 
|  +1  |  mvninstall  |   0m 42s   | the patch passed 
|  +1  | compile  |   0m 37s   | the patch passed 
|  +1  |   javac  |   0m 37s   | the patch passed 
|  +1  |  checkstyle  |   0m 28s   | the patch passed 
|  +1  | mvnsite  |   0m 41s   | the patch passed 
|  +1  |  whitespace  |   0m  0s   | The patch has no whitespace issues. 
|  +1  |shadedclient  |  14m  2s   | patch has no errors when building and 
|  |  || testing our client artifacts.
|  +1  |findbugs  |   1m 37s   | the patch passed 
|  +1  | javadoc  |   0m 28s   | the patch passed 
|  |  || Other Tests 
|  +1  |unit  |  89m  8s   | hadoop-yarn-server-resourcemanager in 
|  |  || the patch passed.
|  +1  |  asflicense  |   0m 26s   | The patch does not generate ASF 
|  |  || License warnings.
|  |  | 149m 17s   | 
{noformat}

[~sunilg] [~prabhujoseph] could you guys commit this change?


> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch, 
> YARN-10198-003.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-20 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063433#comment-17063433
 ] 

Peter Bacsko commented on YARN-10198:
-

[~maniraj...@gmail.com] I was thinking about adding a comment about null-check. 
In short: it's not necessary, because instanceof simply returns false if the 
object is null. 

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch, 
> YARN-10198-003.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-20 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10198:

Attachment: YARN-10198-003.patch

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch, 
> YARN-10198-003.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062531#comment-17062531
 ] 

Peter Bacsko edited comment on YARN-10198 at 3/19/20, 12:39 PM:


I uploaded patch v2.

Couple of things:
1. The most important: I was reasoning about the validation in case of 
{{%secondary_group}} and based on the existing code, this cannot have a managed 
parent. The queue has to exist, see {{getSecondaryGroup()}}. This also seems to 
be in line with Fair Scheduler, where this placement rule is called 
"SecondaryGroupExistingPlacementRule". Please confirm this 
[~maniraj...@gmail.com], [~prabhujoseph].

2. I had to do some refactor in {{UserGroupMappingPlacementRule}} because 
readability is becoming more of a concern with the added features. It will be 
addressed by YARN-10199 hopefully.

3. Added extra unit tests. Existing tests are not broken by this change (at 
least not the ones in {{TestUserGroupMappingPlacementRule}}).


was (Author: pbacsko):
I uploaded patch v2.

Couple of things:
1. I was reasoning about the validation in case of {{%secondary_group}} and 
based on the existing code, this cannot have a managed parent. The queue has to 
exist, see {{getSecondaryGroup()}}. This also seems to be in line with Fair 
Scheduler, where this placement rule is called 
"SecondaryGroupExistingPlacementRule".

2. I had to do some refactor in {{UserGroupMappingPlacementRule}} because 
readability is becoming more of a concern with the added features. It will be 
addressed by YARN-10199 hopefully.

3. Added extra unit tests. Existing tests are not broken by this change (at 
least not the ones in {{TestUserGroupMappingPlacementRule}}).

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062531#comment-17062531
 ] 

Peter Bacsko commented on YARN-10198:
-

I uploaded patch v2.

Couple of things:
1. I was reasoning about the validation in case of {{%secondary_group}} and 
based on the existing code, this cannot have a managed parent. The queue has to 
exist, see {{getSecondaryGroup()}}. This also seems to be in line with Fair 
Scheduler, where this placement rule is called 
"SecondaryGroupExistingPlacementRule".

2. I had to do some refactor in {{UserGroupMappingPlacementRule}} because 
readability is becoming more of a concern with the added features. It will be 
addressed by YARN-10199 hopefully.

3. Added extra unit tests. Existing tests are not broken by this change (at 
least not the ones in {{TestUserGroupMappingPlacementRule}}).

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10198:

Attachment: YARN-10198-002.patch

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062414#comment-17062414
 ] 

Peter Bacsko commented on YARN-10198:
-

[~sunilg] the patch needs to be extended. In its current form, it's not enough. 
I'm going to attach a new version today.

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-17 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060958#comment-17060958
 ] 

Peter Bacsko commented on YARN-10198:
-

Test failure is unrelated.

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser

2020-03-17 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060876#comment-17060876
 ] 

Peter Bacsko edited comment on YARN-10199 at 3/17/20, 12:26 PM:


[~gandras] I'm wondering whether it makes sense to create a completely new 
class for this like {{CurrentUserMappingFactory}}. Then the whole logic becomes 
more testable and we can also expand test coverage.


was (Author: pbacsko):
[~gandras] I'm wondering whether it makes sense to create a completely new 
class for this. Then the whole logic becomes more testable.

> Simplify UserGroupMappingPlacementRule#getPlacementForUser
> --
>
> Key: YARN-10199
> URL: https://issues.apache.org/jira/browse/YARN-10199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10199.001.patch
>
>
> The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly 
> responsible for queue naming, contains deeply nested branches. In order to 
> provide an extendable mapping logic, the branches could be flattened and 
> simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10199) Simplify UserGroupMappingPlacementRule#getPlacementForUser

2020-03-17 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060876#comment-17060876
 ] 

Peter Bacsko commented on YARN-10199:
-

[~gandras] I'm wondering whether it makes sense to create a completely new 
class for this. Then the whole logic becomes more testable.

> Simplify UserGroupMappingPlacementRule#getPlacementForUser
> --
>
> Key: YARN-10199
> URL: https://issues.apache.org/jira/browse/YARN-10199
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10199.001.patch
>
>
> The UserGroupMappingPlacementRule#getPlacementForUser method, which is mainly 
> responsible for queue naming, contains deeply nested branches. In order to 
> provide an extendable mapping logic, the branches could be flattened and 
> simplified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-17 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060874#comment-17060874
 ] 

Peter Bacsko commented on YARN-10197:
-

[~prabhujoseph] could you review & commit this patch? 

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch, YARN-10197-002.patch, 
> YARN-10197-003.patch
>
>
> There are three problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-17 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060857#comment-17060857
 ] 

Peter Bacsko commented on YARN-10198:
-

Thanks [~akhilpb]. 

[~prabhujoseph] could you please review patch 001?

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-17 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10198:

Attachment: YARN-10198-001.patch

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-16 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060409#comment-17060409
 ] 

Peter Bacsko commented on YARN-10198:
-

[~maniraj...@gmail.com] I think so. YARN-9868 introduced a regression. With the 
new changes, a leaf queue named {{%primary_group}} has to exist which was not 
the case before - if {{managedQueue}} exists with 
{{auto-create-child-queue=true}}, it was dynamically created. Since an existing 
behavior changed, that part of the patch should be reverted. 

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-16 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Attachment: YARN-10197-003.patch

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch, YARN-10197-002.patch, 
> YARN-10197-003.patch
>
>
> There are three problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10198:

Summary: [managedParent].%primary_group mapping rule doesn't work after 
YARN-9868  (was: [managedParent].%primary_group placement doesn't work after 
YARN-9868)

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10198) [managedParent].%primary_group placement doesn't work after YARN-9868

2020-03-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058808#comment-17058808
 ] 

Peter Bacsko commented on YARN-10198:
-

cc [~prabhujoseph] [~sunilg]

> [managedParent].%primary_group placement doesn't work after YARN-9868
> -
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10198) [managedParent].%primary_group placement doesn't work after YARN-9868

2020-03-13 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10198:
---

 Summary: [managedParent].%primary_group placement doesn't work 
after YARN-9868
 Key: YARN-10198
 URL: https://issues.apache.org/jira/browse/YARN-10198
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


YARN-9868 introduced an unnecessary check if we have the following placement 
rule:

[managedParentQueue].%primary_group

Here, {{%primary_group}} is expected to be created if it doesn't exist. 
However, there is this validation code which is not necessary:

{noformat}
  } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
if (this.queueManager
.getQueue(groups.getGroups(user).get(0)) != null) {
  return getPlacementContext(mapping,
  groups.getGroups(user).get(0));
} else {
  return null;
}
{noformat}

We should revert this part to the original version:
{noformat}
  } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
return getPlacementContext(mapping, groups.getGroups(user).get(0));
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Description: 
There are three problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should also 
check if the current value differs from the global setting and only then output 
"1.0" for a given queue.

3) The multi-leaf queue check is no longer necessary and it doesn't work 
anyway. The CS instance catches this kind of error during the verification step.

  was:
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should also 
check if the current value differs from the global setting and only then output 
"1.0" for a given queue.

3) The multi-leaf queue check is no longer necessary and it doesn't work 
anyway. The CS instance catches this kind of error during the verification step.


> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch, YARN-10197-002.patch
>
>
> There are three problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Attachment: YARN-10197-002.patch

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch, YARN-10197-002.patch
>
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058767#comment-17058767
 ] 

Peter Bacsko commented on YARN-10197:
-

Note: patch v1 will be extended later with some more unit tests regarding max 
AM share.

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch
>
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Attachment: YARN-10197-001.patch

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10197-001.patch
>
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Description: 
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should also 
check if the current value differs from the global setting and only then output 
"1.0" for a given queue.

3) The multi-leaf queue check is no longer necessary and it doesn't work 
anyway. The CS instance catches this kind of error during the verification step.

  was:
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should also 
check if the current value differs from the global setting and only then output 
"1.0" for a given queue.


> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.
> 3) The multi-leaf queue check is no longer necessary and it doesn't work 
> anyway. The CS instance catches this kind of error during the verification 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Description: 
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should also 
check if the current value differs from the global setting and only then output 
"1.0" for a given queue.

  was:
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible.


> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible. In {{FSQueueConverter.emitMaxAMShare()}}, we should 
> also check if the current value differs from the global setting and only then 
> output "1.0" for a given queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Description: 
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible.

  was:
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo". not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible.


> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo", not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10197:

Description: 
There are two problems that have to be addressed in the converter:

1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
should be "fifo". not "FIFO". The reason we generate "FIFO" is because 
{{FifoPolicy.NAME}} consists of uppercase letters.

2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
FS, you can globally disable it with 
{{-1.0}}. However this case is 
not handled properly in 
{{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max AM 
check is disabled, therefore we have to generate the value "1.0" to allow as 
many AMs as possible.

> FS-CS converter: fix emitted ordering policy string and max-am-resource 
> percent value
> -
>
> Key: YARN-10197
> URL: https://issues.apache.org/jira/browse/YARN-10197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> There are two problems that have to be addressed in the converter:
> 1) For {{yarn.scheduler.capacity..ordering-policy}}, the emitted value 
> should be "fifo". not "FIFO". The reason we generate "FIFO" is because 
> {{FifoPolicy.NAME}} consists of uppercase letters.
> 2) {{maximum-am-resource-percent}} calculation is faulty too. For example, in 
> FS, you can globally disable it with 
> {{-1.0}}. However this case 
> is not handled properly in 
> {{FSConfigToCSConfigConverter.emitDefaultMaxAMShare()}}. -1.0 means that max 
> AM check is disabled, therefore we have to generate the value "1.0" to allow 
> as many AMs as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10197:
---

 Summary: FS-CS converter: fix emitted ordering policy string and 
max-am-resource percent value
 Key: YARN-10197
 URL: https://issues.apache.org/jira/browse/YARN-10197
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10193) FS-CS converter: fix incorrect capacity conversion

2020-03-11 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10193:

Attachment: YARN-10193-001.patch

> FS-CS converter: fix incorrect capacity conversion
> --
>
> Key: YARN-10193
> URL: https://issues.apache.org/jira/browse/YARN-10193
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10193-001.patch
>
>
> Conversion of capacities are incorrect if the total doesn't add up exactly to 
> 100.00%.
> The loop invariant must be fixed:
> {noformat}
>  for (int i = 0; i < children.size() - 2; i++) {
> {noformat}
> The testcase needs to be fixed too:
> {noformat}
> assertEquals("root.default capacity", "33.333",
> csConfig.get(PREFIX + "root.default.capacity"));
> assertEquals("root.admins capacity", "33.333",
> csConfig.get(PREFIX + "root.admins.capacity"));
> assertEquals("root.users capacity", "66.667",
> csConfig.get(PREFIX + "root.users.capacity"));
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10193) FS-CS converter: fix incorrect capacity conversion

2020-03-11 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10193:
---

 Summary: FS-CS converter: fix incorrect capacity conversion
 Key: YARN-10193
 URL: https://issues.apache.org/jira/browse/YARN-10193
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Conversion of capacities are incorrect if the total doesn't add up exactly to 
100.00%.

The loop invariant must be fixed:
{noformat}
 for (int i = 0; i < children.size() - 2; i++) {
{noformat}

The testcase needs to be fixed too:
{noformat}
assertEquals("root.default capacity", "33.333",
csConfig.get(PREFIX + "root.default.capacity"));
assertEquals("root.admins capacity", "33.333",
csConfig.get(PREFIX + "root.admins.capacity"));
assertEquals("root.users capacity", "66.667",
csConfig.get(PREFIX + "root.users.capacity"));
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-11 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057009#comment-17057009
 ] 

Peter Bacsko commented on YARN-10191:
-

Checkstyle can be ignored, it's just test code.

[~snemeth] could you please review & commit?

> FS-CS converter: call System.exit() for every code path in main()
> -
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-11 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10191:

Attachment: YARN-10191-001.patch

> FS-CS converter: call System.exit() for every code path in main()
> -
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-11 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10191:

Description: 
Note that we don't call {{System.exit()}} on the happy path scenario in the 
converter:
{code:java}
  public static void main(String[] args) {
try {
  FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
  new FSConfigToCSConfigArgumentHandler();
  int exitCode =
  fsConfigConversionArgumentHandler.parseAndConvert(args);
  if (exitCode != 0) {
LOG.error(FATAL,
"Error while starting FS configuration conversion, " +
"see previous error messages for details!");
System.exit(exitCode);
  }
} catch (Throwable t) {
  LOG.error(FATAL,
  "Error while starting FS configuration conversion!", t);
  System.exit(-1);
}
  }
 {code}
This is a mistake. If there's any non-daemon thread hanging around which was 
started by either FS or CS, the tool will never terminate. We must call 
{{System.exit()}} in every occasion to make sure that it never blocks at the 
end.

  was:
Note that we don't always call {{System.exit()}} on the happy path scenario in 
the converter:
{code:java}
  public static void main(String[] args) {
try {
  FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
  new FSConfigToCSConfigArgumentHandler();
  int exitCode =
  fsConfigConversionArgumentHandler.parseAndConvert(args);
  if (exitCode != 0) {
LOG.error(FATAL,
"Error while starting FS configuration conversion, " +
"see previous error messages for details!");
System.exit(exitCode);
  }
} catch (Throwable t) {
  LOG.error(FATAL,
  "Error while starting FS configuration conversion!", t);
  System.exit(-1);
}
  }
 {code}
This is a mistake. If there's any non-daemon thread hanging around which was 
started by either FS or CS, the tool will never terminate. We must call 
{{System.exit()}} in every occasion to make sure that it never blocks at the 
end.


> FS-CS converter: call System.exit() for every code path in main()
> -
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-11 Thread Peter Bacsko (Jira)

Peter Bacsko created YARN-10191:
---

 Summary: FS-CS converter: call System.exit() for every code path 
in main()
 Key: YARN-10191
 URL: https://issues.apache.org/jira/browse/YARN-10191
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Note that we don't always call {{System.exit()}} on the happy path scenario in 
the converter:
{code:java}
  public static void main(String[] args) {
try {
  FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
  new FSConfigToCSConfigArgumentHandler();
  int exitCode =
  fsConfigConversionArgumentHandler.parseAndConvert(args);
  if (exitCode != 0) {
LOG.error(FATAL,
"Error while starting FS configuration conversion, " +
"see previous error messages for details!");
System.exit(exitCode);
  }
} catch (Throwable t) {
  LOG.error(FATAL,
  "Error while starting FS configuration conversion!", t);
  System.exit(-1);
}
  }
 {code}
This is a mistake. If there's any non-daemon thread hanging around which was 
started by either FS or CS, the tool will never terminate. We must call 
{{System.exit()}} in every occasion to make sure that it never blocks at the 
end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-10 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055691#comment-17055691
 ] 

Peter Bacsko commented on YARN-10168:
-

[~bteke] good catch, removed that.

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch, YARN-10168-004.patch, YARN-10168-005.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-09 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10168:

Attachment: YARN-10168-005.patch

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch, YARN-10168-004.patch, YARN-10168-005.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055089#comment-17055089
 ] 

Peter Bacsko edited comment on YARN-10185 at 3/9/20, 3:31 PM:
--

I don't think that {{volatile}} is needed at all. The initialization occurs 
from the following code path:

{noformat}
ContainerExecutor.setConf(Configuration)
ReflectionUtils.setConf(Object, Configuration)
ReflectionUtils.newInstance(Class, Configuration)
NodeManager.createContainerExecutor(Configuration)
NodeManager.serviceInit(Configuration)
AbstractService.init(Configuration)
...
{noformat}

The initialization in {{AbstractService.init()}} occurs inside a 
{{synchronized}} block, which happens to act as a memory fence as soon as you 
exit the block. That is, all pending writes and reads are flushed: 
https://www.infoq.com/articles/memory_barriers_jvm_concurrency/

Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, 
essentially all the variables there should be handled too. But it's not 
necessary, given the JVM memory model.


was (Author: pbacsko):
I don't think that {{volatile}} is needed at all. The initialization occurs 
from the following code path:

{noformat}
ContainerExecutor.setConf(Configuration)
ReflectionUtils.setConf(Object, Configuration)
ReflectionUtils.newInstance(Class, Configuration)
NodeManager.createContainerExecutor(Configuration)
NodeManager.serviceInit(Configuration)
AbstractService.init(Configuration)
...
{noformat}

The initialization in {{AbstractService.init()}} occurs inside a 
{{synchronized}} block, which happens to act as a memory fence as soon as you 
exit the block. That is, all pending writes and reads are flushed: 
https://www.infoq.com/articles/memory_barriers_jvm_concurrency/

Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, 
essentially all the variables there should be handled too. But it's not 
necessary, given the JVM memory model.

 

 

> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10185) Container executor fields should be volatile

2020-03-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055089#comment-17055089
 ] 

Peter Bacsko commented on YARN-10185:
-

I don't think that {{volatile}} is needed at all. The initialization occurs 
from the following code path:

{noformat}
ContainerExecutor.setConf(Configuration)
ReflectionUtils.setConf(Object, Configuration)
ReflectionUtils.newInstance(Class, Configuration)
NodeManager.createContainerExecutor(Configuration)
NodeManager.serviceInit(Configuration)
AbstractService.init(Configuration)
...
{noformat}

The initialization in {{AbstractService.init()}} occurs inside a 
{{synchronized}} block, which happens to act as a memory fence as soon as you 
exit the block. That is, all pending writes and reads are flushed: 
https://www.infoq.com/articles/memory_barriers_jvm_concurrency/

Also, there's {{LinuxContainerExecutor}} which calls {{super.setConf()}}, 
essentially all the variables there should be handled too. But it's not 
necessary, given the JVM memory model.

 

 

> Container executor fields should be volatile
> 
>
> Key: YARN-10185
> URL: https://issues.apache.org/jira/browse/YARN-10185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Denes Gerencser
>Priority: Major
>
> In YARN-7226 and YARN-10173 there two fields have been added to the 
> {{ContainerExecutor}} class. These fields are set through {{#setConf()}} only 
> once, but in a multithreaded environment the volatile keyword should be added 
> to ensure that the updated fields are used.
>  
> Related piece of code:
> {code:java}
> private String[] whitelistVars;
>   private int exitCodeFileTimeout =
>   YarnConfiguration.DEFAULT_NM_CONTAINER_EXECUTOR_EXIT_FILE_TIMEOUT;
> {code}
> This can be hardly unit tested, but the bug could cause the UT added by 
> YARN-10173 to fail in a very small percentage.
> Thanks for [~denes.gerencser] for finding this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9354) TestUtils#createResource calls should be replaced with ResourceTypesTestHelper#newResource

2020-03-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055017#comment-17055017
 ] 

Peter Bacsko edited comment on YARN-9354 at 3/9/20, 2:32 PM:
-

[~gandras] could you fix the checkstyle issues?

Otherwise +1 (non-binding).


was (Author: pbacsko):
[~gandras] could you fix the checkstyle issues?

> TestUtils#createResource calls should be replaced with 
> ResourceTypesTestHelper#newResource
> --
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9354) TestUtils#createResource calls should be replaced with ResourceTypesTestHelper#newResource

2020-03-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055017#comment-17055017
 ] 

Peter Bacsko commented on YARN-9354:


[~gandras] could you fix the checkstyle issues?

> TestUtils#createResource calls should be replaced with 
> ResourceTypesTestHelper#newResource
> --
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-09 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054979#comment-17054979
 ] 

Peter Bacsko commented on YARN-10168:
-

Test failure is unrelated.

[~snemeth] could you please review this patch?

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch, YARN-10168-004.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-09 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10168:

Attachment: YARN-10168-004.patch

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch, YARN-10168-004.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-06 Thread Peter Bacsko (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10168:

Attachment: YARN-10168-003.patch

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-06 Thread Peter Bacsko (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053345#comment-17053345
 ] 

Peter Bacsko commented on YARN-9879:


[~prabhujoseph] thanks, I think it's likely that this piece of code is missing 
that I mentioned here: 
https://issues.apache.org/jira/browse/YARN-10108?focusedCommentId=17025143=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17025143

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1682 matches

Mail list logo