[jira] [Commented] (YARN-10617) Fifo and Fair intra-queue preemption goes on indefinitely when apps are in pending state due to max AM limit reached

2021-02-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281535#comment-17281535
 ] 

Sunil G commented on YARN-10617:


Hi [~ananyo_rao]
Yes. In the preemption module, we get all apps from the scheduler. Hence some 
of the apps may be in a pending state which cant be scheduled (due to AM limit 
etc). So I think this is a quick fix. 

> Fifo and Fair intra-queue preemption goes on indefinitely when apps are in 
> pending state due to max AM limit reached
> 
>
> Key: YARN-10617
> URL: https://issues.apache.org/jira/browse/YARN-10617
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: YARN-10617.patch
>
>
> This case occurs when:
> 1. an application gets submitted in a cluster running at max-AM limit.
> 2. The new job requests AM resource. So it has 1 pending request.
> 3. To fulfil this request, the preemption logic preempts 1 resource from a 
> running app.
> 4. Because the cluster is at max-AM limit, the scheduler re-assigns the 
> preempted container back to the running app.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10598) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to extend the creation type with additional information

2021-01-26 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272546#comment-17272546
 ] 

Sunil G commented on YARN-10598:


[~bteke] [~snemeth]
queueType cannot be changed as It will be incompatible. It's better we add only 
a new variable. 

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to extend the 
> creation type with additional information
> --
>
> Key: YARN-10598
> URL: https://issues.apache.org/jira/browse/YARN-10598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10598.001.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> Auto-queue creation has been also implemented with YARN-10506.
> Connected to this effort, we would like to expose the type of the queue with 
> the RM's /scheduler REST endpoint.
> To extend/modify the values added in YARN-10581 these 3 fields will describe 
> a queue:
>  * queueType : *parent/leaf*
>  * creationMethod : *static/dynamicLegacy/dynamicFlexible*
>  * autoCreationEligibility : *off/legacy/flexible*
> After this change here are some example cases:
>  * Static parent queue which has the auto-creation-enabled-v2 false:
>  ** queueType : *parent*
>  ** creationMethod : *static*
>  ** autoCreationEligibility : *off*
>  * Static managed parent (can have dynamic children):
>  ** queueType : *parent*
>  ** creationMethod : *static*
>  ** autoCreationEligibility : *legacy*
>  * Legacy auto-created leaf queue (cannot have children):
>  ** queueType : *leaf*
>  ** creationMethod : *dynamicLegacy*
>  ** autoCreationEligibility : *off*
>  * Auto-created (v2) parent queue, (implicitly) auto-creation-enabled-v2 
> true: 
>  ** queueType : *parent*
>  ** creationMethod : *dynamicFlexible*
>  ** autoCreationEligibility : *flexible*
>  * Auto-created (v2) leaf queue (cannot have children):
>  ** queueType : *leaf*
>  ** creationMethod : *dynamicFlexible*
>  ** autoCreationEligibility : *off*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10579) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include weight values for queues

2021-01-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268712#comment-17268712
 ] 

Sunil G commented on YARN-10579:


[~snemeth] test failures are related.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> weight values for queues
> --
>
> Key: YARN-10579
> URL: https://issues.apache.org/jira/browse/YARN-10579
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10579.001.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
>  We would like to expose the weight values for all queues with the RM's 
> /scheduler REST endpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268402#comment-17268402
 ] 

Sunil G commented on YARN-10512:


[~snemeth], I pushed to trunk. But branch-3.3, 3.2 all have conflicts. Please 
help to rebase.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268385#comment-17268385
 ] 

Sunil G commented on YARN-10512:


[~snemeth], my suggestion is to create a followup jira to handle the changes 
suggested by [~pbacsko].
If there are no issues, I will push this now.

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10572) Merge YARN-8557 and YARN-10352, and rebase based YARN-10380.

2021-01-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268373#comment-17268373
 ] 

Sunil G commented on YARN-10572:


[~zhuqi], Description or the task items of this jira is vague. All work is 
happening in the apache "trunk" and not in the branch, hence I am not very sure 
about the work described here.

So please help to update the description in line with the patch, which could be 
a bug fix or improvement, to clearly state the task. Thanks.
cc [~leftnoteasy]

> Merge YARN-8557 and YARN-10352, and rebase based YARN-10380.
> 
>
> Key: YARN-10572
> URL: https://issues.apache.org/jira/browse/YARN-10572
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10572.001.patch
>
>
> The work is :
> 1. Because of  YARN-10380, We should rebase YARN-10352
> 2. Also merge YARN-8557 for not running case skip.
> 3. Refactor some method in YARN-10380



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267832#comment-17267832
 ] 

Sunil G commented on YARN-10512:


[~snemeth], I could see some more checkstyle issues. Are those fixable?

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch, YARN-10512.004.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267366#comment-17267366
 ] 

Sunil G commented on YARN-10512:


[~snemeth] Thanks for working on this. And thanks [~gandras] for reviews.
I am also +1 for the v3 patch.

If there are no major comments, I can merge this. 

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10512:
--

Assignee: Szilard Nemeth  (was: Sunil G)

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10512) CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10512:
--

Assignee: Sunil G  (was: Szilard Nemeth)

> CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include 
> mode of operation for CS
> --
>
> Key: YARN-10512
> URL: https://issues.apache.org/jira/browse/YARN-10512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-10512.001.patch, YARN-10512.002.patch, 
> YARN-10512.003.patch
>
>
> Under this umbrella (YARN-10496), weight-mode has been implemented for CS 
> with YARN-10504.
> We would like to expose the mode of operation with the RM's /scheduler REST 
> endpoint.
> The field name will be 'mode'.
> All queue representations in the response will be uniformly hold any of the 
> mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10575) Hadoop

2021-01-18 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267108#comment-17267108
 ] 

Sunil G edited comment on YARN-10575 at 1/18/21, 9:15 AM:
--

[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context if needed.


was (Author: sunilg):
[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context. I will close this for now.

> Hadoop
> --
>
> Key: YARN-10575
> URL: https://issues.apache.org/jira/browse/YARN-10575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Pushpalatha S K
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10575) Hadoop

2021-01-18 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267108#comment-17267108
 ] 

Sunil G commented on YARN-10575:


[~Pushpalatha_13] Is jira created by mistake?
please reopen and add more context. I will close this for now.

> Hadoop
> --
>
> Key: YARN-10575
> URL: https://issues.apache.org/jira/browse/YARN-10575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Pushpalatha S K
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263095#comment-17263095
 ] 

Sunil G commented on YARN-10559:


Also you may need to change *break* statement to *continue* in 
skipContainerBasedOnIntraQueuePolicy method.

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263089#comment-17263089
 ] 

Sunil G edited comment on YARN-10559 at 1/12/21, 5:42 AM:
--

[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.
 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}


was (Author: sunilg):
[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.

 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263089#comment-17263089
 ] 

Sunil G commented on YARN-10559:


[~ananyo_rao] Thanks for the efforts. Few minor comments.
 # Please take a look at the newly introduced 7 checkstyle warnings and help to 
check how many of them can be fixed. 
[https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/457/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 # In setFairShareForApps, there could be some chances for divide by 0. Let me 
quote those here.
{code:java}
int numOfAppsInQueue = tq.leafQueue.getAllApplications().size();
Resource fairShareAcrossApps = Resources.divideAndCeil(
  this.rc, queueReassignableResource, numOfAppsInQueue);
{code}
We are handling this internally. but still, it's better to add a check.

 # It's better to ensure two points here. We should get a non-negative resource 
and its better to check & ensurefairShareForApp cannot be 0
{code:java}
Resource fairShareForApp = Resources.min(
rc, clusterResource, fairShareAcrossApps, fairShareWithinUL);{code}

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10504) Implement weight mode in Capacity Scheduler

2021-01-11 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262580#comment-17262580
 ] 

Sunil G commented on YARN-10504:


I checked the latest patch over the weekend. There are no major comments from 
my side on this.

However, I think some more detailed refactoring is needed which should address 
a common way to handle the weight, percentage, absolute values etc. This could 
help to avoid a few hardcodings as well. So let's create another Jira to track 
this.

If test cases are fine, then I am generally +ve in getting this in soon. Thanks.

Thanks, [~bteke] [~wangda] [~zhuqi] for the efforts on this.

> Implement weight mode in Capacity Scheduler
> ---
>
> Key: YARN-10504
> URL: https://issues.apache.org/jira/browse/YARN-10504
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10504.001.patch, YARN-10504.002.patch, 
> YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, 
> YARN-10504.006.patch, YARN-10504.007.patch, YARN-10504.008.patch, 
> YARN-10504.009.patch, YARN-10504.010.patch, YARN-10504.ver-1.patch, 
> YARN-10504.ver-2.patch, YARN-10504.ver-3.patch
>
>
> To allow the possibility to flexibly create queues in Capacity Scheduler a 
> weight mode should be introduced. The existing \{{capacity }}property should 
> be used with a different syntax, i.e:
> root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0
> root.users.capacity = 1.0w
> root.users.capacity = w:1.0
> Weight support should not impact the existing functionality.
>  
> The new functionality should: 
>  * accept and validate the new weight values
>  * enforce a singular mode on the whole queue tree
>  * (re)calculate the relative (percentage-based) capacities based on the 
> weights during launch and every time the queue structure changes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-05 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258863#comment-17258863
 ] 

Sunil G commented on YARN-10559:


[~epayne], could you please help to take a look in to this improvement.

We got this issue when a single user was using the entire queue and fairness 
based preemption was not happening due to some checks. This is a simple effort 
to fix as an initial approach. 

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Fix For: 3.1.4
>
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> YARN-10559.0001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-01-05 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10559:
--

Assignee: VADAGA ANANYO RAO

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Fix For: 3.1.4
>
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> YARN-10559.0001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254114#comment-17254114
 ] 

Sunil G commented on YARN-10540:


[~hexiaoqiao], This is working for me in the latest build.

You are missing the cors configurations in YARN. If you add them, this will 
load.
{code:java}


  hadoop.http.filter.initializers
  org.apache.hadoop.security.HttpCrossOriginFilterInitializer

  Enable/disable the cross-origin (CORS) filter.
  hadoop.http.cross-origin.enabled
  true

  Comma separated list of origins that are allowed for web
services needing cross-origin (CORS) support. Wildcards (*) and patterns
allowed
  hadoop.http.cross-origin.allowed-origins
  *

  Comma separated list of methods that are allowed for web
services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.allowed-methods
  GET,POST,HEAD

  Comma separated list of headers that are allowed for web
services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.allowed-headers
  X-Requested-With,Content-Type,Accept,Origin

  The number of seconds a pre-flighted request can be cached
for web services needing cross-origin (CORS) support.
  hadoop.http.cross-origin.max-age
  1800
 {code}

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254112#comment-17254112
 ] 

Sunil G commented on YARN-10540:


!Screenshot 2020-12-23 at 8.24.42 PM.png!

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-23 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-23 at 8.24.42 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Assignee: Jim Brennan
>Priority: Critical
> Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Screenshot 2020-12-23 at 
> 8.24.42 PM.png, YARN-10540.001.patch, Yarn-UI-Ubuntu.png, osx-yarn-ui2.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-21 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252953#comment-17252953
 ] 

Sunil G commented on YARN-10540:


[~Jim_Brennan] [~ebadger] pls share your thoughts as we see this NPE only in 
Mac and in branch-3.2.2. Are we missing some patches?

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Mac-Yarn-UI.png, Screenshot 2020-12-19 at 11.01.43 
> PM.png, Screenshot 2020-12-19 at 11.02.14 PM.png, Yarn-UI-Ubuntu.png, 
> yarnodes.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252303#comment-17252303
 ] 

Sunil G commented on YARN-10540:


Hi [~hexiaoqiao]

I tried the latest TRUNK build and I am able to see that these pages are 
getting loaded. I am checking a bit more in a VM.

Could you please click on NODES tab in this UI and see whether it is working in 
your env?

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png, yarnui2onubuntu.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252245#comment-17252245
 ] 

Sunil G commented on YARN-10540:


I brought up this local cluster in my MAC. And I can see below log in NM.

Since ResourceCalculatorPlugin can't be initialized, NodeUtilization object 
will be NULL. And that caused the issue mentioned in this jira.

So far, no one was using NodeUtilization in NodeInfo, and this issue was never 
uncovered so far.
{code:java}
2020-12-19 22:37:05,096 WARN 
org.apache.hadoop.yarn.util.ResourceCalculatorPlugin: Failed to instantiate 
default resource calculator. Could not determine OS
2020-12-19 22:37:05,096 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl:  Using 
ResourceCalculatorPlugin : null {code}

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Priority: Critical  (was: Blocker)

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Critical
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-19 at 11.02.14 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252243#comment-17252243
 ] 

Sunil G commented on YARN-10540:


!Screenshot 2020-12-19 at 11.02.14 PM.png!!Screenshot 2020-12-19 at 11.01.43 
PM.png!

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Target Version/s: 3.2.2

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10540:
---
Attachment: Screenshot 2020-12-19 at 11.01.43 PM.png

> Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes
> 
>
> Key: YARN-10540
> URL: https://issues.apache.org/jira/browse/YARN-10540
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 3.2.2
>Reporter: Sunil G
>Priority: Blocker
> Attachments: Screenshot 2020-12-19 at 11.01.43 PM.png, Screenshot 
> 2020-12-19 at 11.02.14 PM.png
>
>
> YARN-10450 added changes in NodeInfo class.
> Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
>  {code}
> {code:java}
> 2020-12-19 22:55:54,846 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10540) Node page is broken in YARN UI1 and UI2 including RMWebService api for nodes

2020-12-19 Thread Sunil G (Jira)
Sunil G created YARN-10540:
--

 Summary: Node page is broken in YARN UI1 and UI2 including 
RMWebService api for nodes
 Key: YARN-10540
 URL: https://issues.apache.org/jira/browse/YARN-10540
 Project: Hadoop YARN
  Issue Type: Task
  Components: webapp
Affects Versions: 3.2.2
Reporter: Sunil G


YARN-10450 added changes in NodeInfo class.

Various exceptions are showing while accessing UI2 and UI1 NODE pages. 
{code:java}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.NodesPage$NodesBlock.render(NodesPage.java:164)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.nodes(RmController.java:70)
 {code}
{code:java}
2020-12-19 22:55:54,846 WARN 
org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.NodeInfo.(NodeInfo.java:103)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getNodes(RMWebServices.java:450)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses

2020-10-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217493#comment-17217493
 ] 

Sunil G commented on YARN-10453:


+1.

I am committing this shortly if there are no issues.

> Add partition resource info to get-node-labels and label-mappings api 
> responses
> ---
>
> Key: YARN-10453
> URL: https://issues.apache.org/jira/browse/YARN-10453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-10453.001.patch, YARN-10453.002.patch, 
> YARN-10453.003.patch
>
>
> This jira will add partition resource info to responses get-node-labels and 
> label-mappings apis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses

2020-10-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217221#comment-17217221
 ] 

Sunil G commented on YARN-10453:


Kicked jenkins again.

> Add partition resource info to get-node-labels and label-mappings api 
> responses
> ---
>
> Key: YARN-10453
> URL: https://issues.apache.org/jira/browse/YARN-10453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-10453.001.patch, YARN-10453.002.patch
>
>
> This jira will add partition resource info to responses get-node-labels and 
> label-mappings apis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-10-15 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214812#comment-17214812
 ] 

Sunil G commented on YARN-9848:
---

Thanks [~aajisaka], This approach is much better now. 

Thank you.

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-02 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206603#comment-17206603
 ] 

Sunil G commented on YARN-10451:


+1 pending jenkins.

> RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.
> ---
>
> Key: YARN-10451
> URL: https://issues.apache.org/jira/browse/YARN-10451
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-10451.001.patch, YARN-10451.002.patch
>
>
> The NodesPage in the RM (v1) UI will NPE when the {{yarn.resource-types}} 
> property defines {{yarn.io}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10244) backport YARN-9848 to branch-3.2

2020-09-29 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204463#comment-17204463
 ] 

Sunil G commented on YARN-10244:


Thanks [~Steven Rand]

Lets get this in. 

> backport YARN-9848 to branch-3.2
> 
>
> Key: YARN-10244
> URL: https://issues.apache.org/jira/browse/YARN-10244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-10244-branch-3.2.001.patch, 
> YARN-10244-branch-3.2.002.patch, YARN-10244-branch-3.2.003.patch
>
>
> Backporting YARN-9848 to branch-3.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10244) backport YARN-9848 to branch-3.2

2020-09-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194690#comment-17194690
 ] 

Sunil G commented on YARN-10244:


Yes, this is a blocker. Waiting for CI results.

> backport YARN-9848 to branch-3.2
> 
>
> Key: YARN-10244
> URL: https://issues.apache.org/jira/browse/YARN-10244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-10244-branch-3.2.001.patch, 
> YARN-10244-branch-3.2.002.patch, YARN-10244-branch-3.2.003.patch
>
>
> Backporting YARN-9848 to branch-3.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10429) [Umbrella] YARN UI2 Improvements

2020-09-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192036#comment-17192036
 ] 

Sunil G commented on YARN-10429:


Thanks [~akhilpb]

Will we cover ember-3 upgrade here?

> [Umbrella] YARN UI2 Improvements 
> -
>
> Key: YARN-10429
> URL: https://issues.apache.org/jira/browse/YARN-10429
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
>
> cc: [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10360) Support Multi Node Placement in SingleConstraintAppPlacementAllocator

2020-08-24 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183117#comment-17183117
 ] 

Sunil G commented on YARN-10360:


+1 for the patch.

Thanks [~prabhujoseph]

> Support Multi Node Placement in SingleConstraintAppPlacementAllocator
> -
>
> Key: YARN-10360
> URL: https://issues.apache.org/jira/browse/YARN-10360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, multi-node-placement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10360-001.patch, YARN-10360-002.patch
>
>
> Currently, placement constraints are not supported when Multi Node Placement 
> is enabled. This Jira is to add Support for Multi Node Placement in 
> SingleConstraintAppPlacementAllocator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10360) Support Multi Node Placement in SingleConstraintAppPlacementAllocator

2020-08-21 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181725#comment-17181725
 ] 

Sunil G commented on YARN-10360:


Yes. When we implemented this, our intention at that point of time to enable 
for Locality allocator. Later once Placement allocator got added, it should 
have been considered.

Thanks [~prabhujoseph] for moving this to base class and as a default 
implementation. I am +1 with the approach and patch. But some test failures are 
there. Please help to check the same. Thanks.

> Support Multi Node Placement in SingleConstraintAppPlacementAllocator
> -
>
> Key: YARN-10360
> URL: https://issues.apache.org/jira/browse/YARN-10360
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, multi-node-placement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10360-001.patch, YARN-10360-002.patch
>
>
> Currently, placement constraints are not supported when Multi Node Placement 
> is enabled. This Jira is to add Support for Multi Node Placement in 
> SingleConstraintAppPlacementAllocator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10396:
---
Fix Version/s: (was: 3.1.5)
   (was: 3.2.2)

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181196#comment-17181196
 ] 

Sunil G commented on YARN-10396:


[~bteke], you have used a new test class from branch-3.3 caused a compilation 
error. Please help to fix this for branch-3.2 and branch-3.1
{code:java}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hadoop-yarn-server-resourcemanager: 
Compilation failure: Compilation failure:
[ERROR] 
/Users/sgovindan/Work/repos/apache/hadoop-commit/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java:[1024,5]
 cannot find symbol
[ERROR] symbol:   class CSQueueStore
[ERROR] location: class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue
[ERROR] 
/Users/sgovindan/Work/repos/apache/hadoop-commit/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java:[1024,31]
 cannot find symbol
[ERROR] symbol:   class CSQueueStore
[ERROR] location: class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue
[ERROR] -> [Help 1] {code}

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181192#comment-17181192
 ] 

Sunil G commented on YARN-10396:


[~ste...@apache.org] Apologies from my end, Its my bad.

Backport got cleanly applied and I pushed quickly.

I ll revert the same now.

 

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-19 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10396:
---
Fix Version/s: 3.1.5
   3.3.1
   3.2.2

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176401#comment-17176401
 ] 

Sunil G commented on YARN-10396:


[~bteke] Thanks for the patch.

Could you add some test cases to cover this scenario as well. Thanks.

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10396.001.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10389) Option to override RMWebServices with custom WebService class

2020-08-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173732#comment-17173732
 ] 

Sunil G commented on YARN-10389:


Thanks [~tanu.ajmera] for working on this. 

Few comments
 # rename yarn.htpp.rmwebapp.custom.webservice.class to 
yarn.webapp.custom.webservice.class
 # In RMWebApp, is it possible to have a NULL scenario for conf object. cc 
[~prabhujoseph]

 

> Option to override RMWebServices with custom WebService class
> -
>
> Key: YARN-10389
> URL: https://issues.apache.org/jira/browse/YARN-10389
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10389-001.patch, YARN-10389-002.patch, 
> YARN-10389-003.patch, YARN-10389-004.patch, YARN-10389-005.patch
>
>
> YARN-8047 provides support to add custom WebServices as part of RMWebApp.  
> Since each WebService has to have a separate WebService Path, /ws/v1/cluster 
> root path cannot be used globally.
> Another alternative is to provide an option to override the RMWebServices 
> with custom WebServices implementation which can extend the RMWebService, 
> this way /ws/v1/cluster path can be used globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-08-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173599#comment-17173599
 ] 

Sunil G commented on YARN-10364:


I think test case failures are related. [~BilwaST] cud u please help to check.

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10364.001.patch, YARN-10364.002.patch, 
> YARN-10364.003.patch
>
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-08-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173598#comment-17173598
 ] 

Sunil G commented on YARN-10364:


+1. [~prabhujoseph] lets commit this.

Thanks [~BilwaST]

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10364.001.patch, YARN-10364.002.patch, 
> YARN-10364.003.patch
>
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10364) Absolute Resource [memory=0] is considered as Percentage config type

2020-07-28 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166176#comment-17166176
 ] 

Sunil G commented on YARN-10364:


Yes. This is an issue as we are considering 0 to be abs mode.

Its better we have to have a flag set and derived as read from config and then 
use for a bit more longer. Seems like a miss earlier. 

> Absolute Resource [memory=0] is considered as Percentage config type
> 
>
> Key: YARN-10364
> URL: https://issues.apache.org/jira/browse/YARN-10364
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Absolute Resource [memory=0] is considered as Percentage config type. This 
> causes failure while converting queues from Percentage to Absolute Resources 
> automatically. 
> *Repro:*
> 1. Queue A = 100% and child queues Queue A.B = 0%, A.C=100%
> 2. While converting above to absolute resource automatically, capacity of 
> queue A = [memory=], A.B = [memory=0]
> This fails with below as A is considered as Absolute Resource whereas B is 
> considered as Percentage config type.
> {code}
> 2020-07-23 09:36:40,499 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: 
> CapacityScheduler configuration validation failed:java.io.IOException: Failed 
> to re-init queues : Parent queue 'root.A' and child queue 'root.A.B' should 
> use either percentage based capacityconfiguration or absolute resource 
> together for label:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10333) YarnClient obtain Delegation Token for Log Aggregation Path

2020-07-08 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153326#comment-17153326
 ] 

Sunil G commented on YARN-10333:


This change looks fine to me.

cc [~ztang] [~bibinchundatt] [~rohithsharmaks] thoughts?

> YarnClient obtain Delegation Token for Log Aggregation Path
> ---
>
> Key: YARN-10333
> URL: https://issues.apache.org/jira/browse/YARN-10333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10333-001.patch, YARN-10333-002.patch, 
> YARN-10333-003.patch
>
>
> There are use cases where Yarn Log Aggregation Path is configured to a 
> FileSystem like S3 or ABFS different from what is configured in fs.defaultFS 
> (HDFS). Log Aggregation fails as the client has token only for fs.defaultFS 
> and not for log aggregation path.
> This Jira is to improve YarnClient by obtaining delegation token for log 
> aggregation path and add it to the Credential of Container Launch Context 
> similar to how it does for Timeline Delegation Token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134335#comment-17134335
 ] 

Sunil G commented on YARN-10166:


Looks good to me!

I can check this in tomo if there are no objections!

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134335#comment-17134335
 ] 

Sunil G edited comment on YARN-10166 at 6/12/20, 4:04 PM:
--

Looks good to me!

I can check this in tomo, if there are no objections!


was (Author: sunilg):
Looks good to me!

I can check this in tomo if there are no objections!

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

2020-06-12 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-10166:
--

Assignee: Youquan Lin

> Add detail log for ApplicationAttemptNotFoundException
> --
>
> Key: YARN-10166
> URL: https://issues.apache.org/jira/browse/YARN-10166
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Youquan Lin
>Assignee: Youquan Lin
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10166-001.patch, YARN-10166-002.patch, 
> YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call 
> unregisterAttempt() for this app. Sometimes, app's AM continues to call the 
> alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_01 doesn't exist in 
> ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the 
> attempt is no longer in the ApplicationMasterService cache. So I think we can 
> add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_01 doesn't exist in 
> ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED 
> ,diagnostics: App application_1582630210671_14658 killed by userA from 
> 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-06-05 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126634#comment-17126634
 ] 

Sunil G commented on YARN-10274:


I think lets get this in to 3.3

> Merge QueueMapping and QueueMappingEntity
> -
>
> Key: YARN-10274
> URL: https://issues.apache.org/jira/browse/YARN-10274
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10274.001.patch, YARN-10274.002.patch, 
> YARN-10274.003.patch
>
>
> The role, usage and internal behaviour of these classes are almost identical, 
> but it makes no sense to keep both of them. One is used by UserGroup 
> placement rule definitions the other is used by Application placement rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10292) FS-CS converter: add an option to enable asynchronous scheduling in CapacityScheduler

2020-05-27 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10292:
---
Component/s: fairscheduler

> FS-CS converter: add an option to enable asynchronous scheduling in 
> CapacityScheduler
> -
>
> Key: YARN-10292
> URL: https://issues.apache.org/jira/browse/YARN-10292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> FS doesn't have an equivalent setting to the 
> yarn.scheduler.capacity.schedule-asynchronously.enable so the FS to CS 
> converter won't add this option to the yarn-site.xml. An optional command 
> line switch should be added to support this option during migration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-21 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113379#comment-17113379
 ] 

Sunil G commented on YARN-8047:
---

Cool. Thanks [~BilwaST], This is super helpful.

cc [~tangzhankun]

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112329#comment-17112329
 ] 

Sunil G commented on YARN-8047:
---

[~BilwaST] cud u pls help to explain how to use an external class ? 
and we can doc also the same.

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-05-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105136#comment-17105136
 ] 

Sunil G commented on YARN-10154:


+1 on latest addendum. Please get this in.

[~maniraj...@gmail.com] any other comments?

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch, 
> YARN-10154.addendum-002.patch, YARN-10154.addendum-003.patch, 
> YARN-10154.addendum-004.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10247) Application priority queue ACLs are not respected

2020-04-28 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10247:
---
Attachment: YARN-10247.0001.patch

> Application priority queue ACLs are not respected
> -
>
> Key: YARN-10247
> URL: https://issues.apache.org/jira/browse/YARN-10247
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-10247.0001.patch
>
>
> This is a regression from queue path jira.
> App priority acls are not working correctly. 
> {code:java}
> yarn.scheduler.capacity.root.B.acl_application_max_priority=[user=john 
> group=users max_priority=4]
> {code}
> max_priority enforcement is not working. For user john, maximum supported 
> priority is 4. However I can submit like priority 6 for this user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10247) Application priority queue ACLs are not respected

2020-04-28 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094374#comment-17094374
 ] 

Sunil G commented on YARN-10247:


[~shuzirra] [~prabhujoseph] pls help to review this change.

> Application priority queue ACLs are not respected
> -
>
> Key: YARN-10247
> URL: https://issues.apache.org/jira/browse/YARN-10247
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-10247.0001.patch
>
>
> This is a regression from queue path jira.
> App priority acls are not working correctly. 
> {code:java}
> yarn.scheduler.capacity.root.B.acl_application_max_priority=[user=john 
> group=users max_priority=4]
> {code}
> max_priority enforcement is not working. For user john, maximum supported 
> priority is 4. However I can submit like priority 6 for this user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10247) Application priority queue ACLs are not respected

2020-04-28 Thread Sunil G (Jira)
Sunil G created YARN-10247:
--

 Summary: Application priority queue ACLs are not respected
 Key: YARN-10247
 URL: https://issues.apache.org/jira/browse/YARN-10247
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacity scheduler
Reporter: Sunil G
Assignee: Sunil G


This is a regression from queue path jira.

App priority acls are not working correctly. 
{code:java}
yarn.scheduler.capacity.root.B.acl_application_max_priority=[user=john 
group=users max_priority=4]
{code}
max_priority enforcement is not working. For user john, maximum supported 
priority is 4. However I can submit like priority 6 for this user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9617) RM UI enables viewing pages using Timeline Reader for a user who can not access the YARN config endpoint

2020-04-20 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-9617:
-

Assignee: Akhil PB

> RM UI enables viewing pages using Timeline Reader for a user who can not 
> access the YARN config endpoint
> 
>
> Key: YARN-9617
> URL: https://issues.apache.org/jira/browse/YARN-9617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Balázs Szabó
>Assignee: Akhil PB
>Priority: Major
> Attachments: 1.png, 2.png
>
>
> If a user who can not access the /conf endpoint she/he will be unable to 
> query the address of the Timeline Service Reader 
> (yarn.timeline-service.reader.webapp.address). In this case, the user 
> receives a "403 Unauthenticated users are not authorized to access this page" 
> response, when trying to view pages requesting data from the Timeline Reader 
> (i.e. Flow Activity tab). In this case the UI is falling back to the default 
> address (localhost:8188), which eventually yields the 401 response (see 
> attached screenshots).
>  
> !1.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087947#comment-17087947
 ] 

Sunil G commented on YARN-9848:
---

Looks good to me. I ll commit this tomorrow if there are no objections.

cc [~snemeth] ICYMI

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-17 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085568#comment-17085568
 ] 

Sunil G commented on YARN-9848:
---

Yes, I was waiting for the jenkins earlier. Checking now.

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch, YARN-9848.002.patch, 
> YARN-9848.003.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-16 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17085090#comment-17085090
 ] 

Sunil G commented on YARN-10154:


Ah, missed it. I will take care of this.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-16 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084936#comment-17084936
 ] 

Sunil G commented on YARN-10154:


I just synced up with [~maniraj...@gmail.com].

Apparently calculateEffective* methods were not invoked earlier. Hence 
updateClusterResource() call to parent queue is needed every time a new dynamic 
queue is added.

+1 to latest patch, and committing shortly.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-16 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084871#comment-17084871
 ] 

Sunil G commented on YARN-10154:


Overall this seems fine.

[~maniraj...@gmail.com] in latest patch, I can see that in 
AutoCreatedLeafQueue, you are explicitly calling 
{{this.getParent().updateClusterResource}} 

Is this needed? Earlier patches doent have that.

 

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9848) revert YARN-4946

2020-04-14 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083434#comment-17083434
 ] 

Sunil G commented on YARN-9848:
---

[~Steven Rand], I can help in reviewing this today. Thanks.

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9848) revert YARN-4946

2020-04-14 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9848:
--
Priority: Blocker  (was: Major)

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9848) revert YARN-4946

2020-04-14 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9848:
--
Target Version/s: 3.3.0

> revert YARN-4946
> 
>
> Key: YARN-9848
> URL: https://issues.apache.org/jira/browse/YARN-9848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, resourcemanager
>Reporter: Steven Rand
>Priority: Blocker
> Attachments: YARN-9848-01.patch
>
>
> In YARN-4946, we've been discussing a revert due to the potential for keeping 
> more applications in the state store than desired, and the potential to 
> greatly increase RM recovery times.
>  
> I'm in favor of reverting the patch, but other ideas along the lines of 
> YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2020-04-14 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082934#comment-17082934
 ] 

Sunil G commented on YARN-10233:


This is a regression and hence YARN UI2 cant see daemon logs. Marking as a 
blocker. Thanks [~akhilpb] and [~prabhujoseph] for finding this. Lets get this 
in quickly.

 

[~prabhujoseph] cud u pl work work RM of 3.3.0 to get this in

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2020-04-14 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10233:
---
Target Version/s: 3.3.0

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10233) [YARN UI2] No Logs were found in "YARN Daemon Logs" page

2020-04-14 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10233:
---
Priority: Blocker  (was: Major)

> [YARN UI2] No Logs were found in "YARN Daemon Logs" page
> 
>
> Key: YARN-10233
> URL: https://issues.apache.org/jira/browse/YARN-10233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Blocker
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10226) NPE in Capacity Scheduler while using %primary_group queue mapping

2020-04-09 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10226:
---
Summary: NPE in Capacity Scheduler while using %primary_group queue mapping 
 (was: NPE when using %primary_group queue mapping)

> NPE in Capacity Scheduler while using %primary_group queue mapping
> --
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10226) NPE when using %primary_group queue mapping

2020-04-09 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079078#comment-17079078
 ] 

Sunil G commented on YARN-10226:


+1. Thanks [~pbacsko]. 
I can commit this if there are no objections.

> NPE when using %primary_group queue mapping
> ---
>
> Key: YARN-10226
> URL: https://issues.apache.org/jira/browse/YARN-10226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-10226-001.patch
>
>
> If we use the following queue mapping:
> {{u:%user:%primary_group}}
> then we get a NPE inside ResourceManager:
> {noformat}
> 2020-04-06 11:59:13,883 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(881)) - Failed to load/recover state
> java.lang.NullPointerException
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.getQueue(CapacitySchedulerQueueManager.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getContextForPrimaryGroup(UserGroupMappingPlacementRule.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForUser(UserGroupMappingPlacementRule.java:118)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:227)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:67)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:827)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:378)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:594)
> ...
> {noformat}
> We to check if parent queue is null in 
> {{UserGroupMappingPlacementRule.getContextForPrimaryGroup()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-28 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069344#comment-17069344
 ] 

Sunil G commented on YARN-10154:


Yes, due to jenkins issues, I was waiting for a clean run.

Apart from the comment from [~wangda], I do not have any other major concerns.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections

2020-03-25 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067333#comment-17067333
 ] 

Sunil G commented on YARN-10194:


[~prabhujoseph] pls rebase to trunk

> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
> 
>
> Key: YARN-10194
> URL: https://issues.apache.org/jira/browse/YARN-10194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-10194-001.patch, YARN-10194-002.patch, 
> YARN-10194-003.patch
>
>
> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation 
> API creates a new CapacityScheduler and missed to close after the validation. 
> Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens 
> ZKConfigurationStore and creates a ZK Connection. 
> *ZK LOGS*
> {code}
> -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 
> times] Error accepting new connection: Too many connections from 
> /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,449 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,710 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,876 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,068 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,391 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,008 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,287 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,483 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> {code}
> And there is an another bug in ZKConfigurationStore which has not handled 
> close() of ZKCuratorManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CapacityScheduler

2020-03-25 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9879:
--
Summary: Allow multiple leaf queues with the same name in CapacityScheduler 
 (was: Allow multiple leaf queues with the same name in CS)

> Allow multiple leaf queues with the same name in CapacityScheduler
> --
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, 
> YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-25 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066607#comment-17066607
 ] 

Sunil G commented on YARN-9879:
---

Thanks [~shuzirra] 

Lets get this in now. +1 to the latest patch.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, 
> YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-24 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066341#comment-17066341
 ] 

Sunil G commented on YARN-9879:
---

[~shuzirra] test case failures seems not related, could you please confirm the 
same?

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.014.patch, YARN-9879.015.patch, YARN-9879.015.patch, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, 
> YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-21 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063918#comment-17063918
 ] 

Sunil G commented on YARN-9879:
---

Somehow jenkins results are not showing here.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.014.patch, YARN-9879.POC001.patch, YARN-9879.POC002.patch, 
> YARN-9879.POC003.patch, YARN-9879.POC004.patch, YARN-9879.POC005.patch, 
> YARN-9879.POC006.patch, YARN-9879.POC007.patch, YARN-9879.POC008.patch, 
> YARN-9879.POC009.patch, YARN-9879.POC010.patch, YARN-9879.POC011.patch, 
> YARN-9879.POC012.patch, YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections

2020-03-20 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063144#comment-17063144
 ] 

Sunil G commented on YARN-10194:


[~prabhujoseph] pls rebase. 

This patch is no longer applying

> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
> 
>
> Key: YARN-10194
> URL: https://issues.apache.org/jira/browse/YARN-10194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-10194-001.patch
>
>
> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation 
> API creates a new CapacityScheduler and missed to close after the validation. 
> Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens 
> ZKConfigurationStore and creates a ZK Connection. 
> *ZK LOGS*
> {code}
> -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 
> times] Error accepting new connection: Too many connections from 
> /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,449 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,710 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,876 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,068 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,391 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,008 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,287 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,483 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> {code}
> And there is an another bug in ZKConfigurationStore which has not handled 
> close() of ZKCuratorManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo

2020-03-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062322#comment-17062322
 ] 

Sunil G commented on YARN-10160:


[~prabhujoseph] pls add test cases for the new change. 

> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo
> --
>
> Key: YARN-10160
> URL: https://issues.apache.org/jira/browse/YARN-10160
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, 
> YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch
>
>
> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo.
> {code}
> yarn.scheduler.capacity..auto-create-child-queue.enabled
> yarn.scheduler.capacity..leaf-queue-template.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062320#comment-17062320
 ] 

Sunil G commented on YARN-10198:


[~prabhujoseph] [~maniraj...@gmail.com] [~pbacsko] Do we have a consensus in 
this attached patch and the approach.

I think patch looks fine. Thoughts?

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections

2020-03-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062318#comment-17062318
 ] 

Sunil G commented on YARN-10194:


+1 for this change.

I ll commit shortly

> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
> 
>
> Key: YARN-10194
> URL: https://issues.apache.org/jira/browse/YARN-10194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-10194-001.patch
>
>
> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation 
> API creates a new CapacityScheduler and missed to close after the validation. 
> Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens 
> ZKConfigurationStore and creates a ZK Connection. 
> *ZK LOGS*
> {code}
> -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 
> times] Error accepting new connection: Too many connections from 
> /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,449 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,710 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,876 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,068 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,391 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,008 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,287 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,483 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> {code}
> And there is an another bug in ZKConfigurationStore which has not handled 
> close() of ZKCuratorManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-19 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062315#comment-17062315
 ] 

Sunil G commented on YARN-9879:
---

[~shuzirra] cud pls check latest errors.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.014.patch, YARN-9879.POC001.patch, YARN-9879.POC002.patch, 
> YARN-9879.POC003.patch, YARN-9879.POC004.patch, YARN-9879.POC005.patch, 
> YARN-9879.POC006.patch, YARN-9879.POC007.patch, YARN-9879.POC008.patch, 
> YARN-9879.POC009.patch, YARN-9879.POC010.patch, YARN-9879.POC011.patch, 
> YARN-9879.POC012.patch, YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-16 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-9879:
--
Target Version/s: 3.3.0

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, 
> YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-13 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058451#comment-17058451
 ] 

Sunil G commented on YARN-10154:


Yes [~clayb], Thank you. You are correct. 

We need to update both min and max capacities with absolute resources. 

[~maniraj...@gmail.com] , cud u pls confirm whether this also captured here. 
Thanks.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058406#comment-17058406
 ] 

Sunil G commented on YARN-9879:
---

[~shuzirra] while uploading next patch, please remove POC string from patch 
name.

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch, 
> YARN-9879.POC013.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10194) YARN RMWebServices /scheduler-conf/validate leaks ZK Connections

2020-03-12 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10194:
---
Priority: Critical  (was: Major)

> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections
> 
>
> Key: YARN-10194
> URL: https://issues.apache.org/jira/browse/YARN-10194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
>
> YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Validation 
> API creates a new CapacityScheduler and missed to close after the validation. 
> Every CapacityScheduler#init opens MutableCSConfigurationProvider which opens 
> ZKConfigurationStore and creates a ZK Connection. 
> *ZK LOGS*
> {code}
> -03-12 16:45:51,881 WARN org.apache.zookeeper.server.NIOServerCnxnFactory: [2 
> times] Error accepting new connection: Too many connections from 
> /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,449 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,710 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:52,876 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,068 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:53,391 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [2 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,008 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,287 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Error accepting new 
> connection: Too many connections from /172.27.99.64 - max is 60
> 2020-03-12 16:45:54,483 WARN 
> org.apache.zookeeper.server.NIOServerCnxnFactory: [4 times] Error accepting 
> new connection: Too many connections from /172.27.99.64 - max is 60
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057771#comment-17057771
 ] 

Sunil G commented on YARN-10154:


Thanks [~clayb]. 
 Regarding to the configuration 
_+yarn.scheduler.capacity..leaf-queue-template.maximum-allocation-mb+_
 and 
+_yarn.scheduler.capacity..leaf-queue-template.maximum-allocation-vcores_+,
 there may be a problem with respect to multiple resources design model. As we 
may have more resources such like GPU or FPGA, then this configuration model 
may not scale to this.

Could we use below model like 
+_yarn.scheduler.capacity..leaf-queue-template.capacity_+ = 
"[memory=10240, vcores=10]" 

This is exactly similar to normal absolute resource design we have done in 
YARN-5881. In that case as well, we were configuring 
+_yarn.scheduler.capacity..capacity_+ = "[memory=10240, vcores=10, 
gpu=6]".

 [~clayb] does this make sense ? Kindly share your thoughts.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10191) FS-CS converter: call System.exit function call for every code path in main method

2020-03-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057753#comment-17057753
 ] 

Sunil G edited comment on YARN-10191 at 3/12/20, 9:35 AM:
--

Thanks [~pbacsko]. Committed to trunk.


was (Author: sunilg):
Thanks [~pbacsko]

> FS-CS converter: call System.exit function call for every code path in main 
> method
> --
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Fix For: 3.3.0
>
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10191) FS-CS converter: call System.exit function call for every code path in main method

2020-03-12 Thread Sunil G (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-10191:
---
Summary: FS-CS converter: call System.exit function call for every code 
path in main method  (was: FS-CS converter: call System.exit() for every code 
path in main())

> FS-CS converter: call System.exit function call for every code path in main 
> method
> --
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-12 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057745#comment-17057745
 ] 

Sunil G commented on YARN-10191:


+1. Thanks [~pbacsko] 

Committing shortly

> FS-CS converter: call System.exit() for every code path in main()
> -
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10191) FS-CS converter: call System.exit() for every code path in main()

2020-03-11 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056912#comment-17056912
 ] 

Sunil G commented on YARN-10191:


+1. Pending jenkins

> FS-CS converter: call System.exit() for every code path in main()
> -
>
> Key: YARN-10191
> URL: https://issues.apache.org/jira/browse/YARN-10191
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Blocker
> Attachments: YARN-10191-001.patch
>
>
> Note that we don't call {{System.exit()}} on the happy path scenario in the 
> converter:
> {code:java}
>   public static void main(String[] args) {
> try {
>   FSConfigToCSConfigArgumentHandler fsConfigConversionArgumentHandler =
>   new FSConfigToCSConfigArgumentHandler();
>   int exitCode =
>   fsConfigConversionArgumentHandler.parseAndConvert(args);
>   if (exitCode != 0) {
> LOG.error(FATAL,
> "Error while starting FS configuration conversion, " +
> "see previous error messages for details!");
> System.exit(exitCode);
>   }
> } catch (Throwable t) {
>   LOG.error(FATAL,
>   "Error while starting FS configuration conversion!", t);
>   System.exit(-1);
> }
>   }
>  {code}
> This is a mistake. If there's any non-daemon thread hanging around which was 
> started by either FS or CS, the tool will never terminate. We must call 
> {{System.exit()}} in every occasion to make sure that it never blocks at the 
> end.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9879) Allow multiple leaf queues with the same name in CS

2020-03-10 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056590#comment-17056590
 ] 

Sunil G commented on YARN-9879:
---

Thanks [~shuzirra]. Appreciate the same. 

> Allow multiple leaf queues with the same name in CS
> ---
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch, YARN-9879.POC012.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-10 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056079#comment-17056079
 ] 

Sunil G commented on YARN-10154:


Sure [~maniraj...@gmail.com] 

I am looking into this.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10168) FS-CS Converter: tool doesn't handle min/max resource conversion correctly

2020-03-10 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056043#comment-17056043
 ] 

Sunil G commented on YARN-10168:


Thanks [~pbacsko] and [~snemeth]

> FS-CS Converter: tool doesn't handle min/max resource conversion correctly
> --
>
> Key: YARN-10168
> URL: https://issues.apache.org/jira/browse/YARN-10168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Blocker
>  Labels: fs2cs
> Fix For: 3.3.0
>
> Attachments: YARN-10168-001.patch, YARN-10168-002.patch, 
> YARN-10168-003.patch, YARN-10168-004.patch, YARN-10168-005.patch
>
>
> Trying to understand logics of convert min and max resource from FS to CS, 
> and found some issues:
> 1)
> In FSQueueConverter#emitMaximumCapacity
> Existing logic in FS is to either specify a maximum percentage for queues 
> against cluster resources. Or, specify an absolute valued maximum resource.
> In the existing FS2CS converter, when a percentage-based maximum resource is 
> specified, the converter takes a global resource from fs2cs CLI, and applies 
> percentages to that. It is not correct since the percentage-based value will 
> get lost, and in the future when cluster resources go up and down, the 
> maximum resource cannot be changed.
> 2)
> The logic to deal with min/weight resource is also questionable:
> The existing fs2cs tool, it takes precedence of percentage over 
> absoluteResource, and could set both to a queue config. See 
> FSQueueConverter.Capacity#toString
> However, in CS, comparing to FS, the weights/min resource is quite different:
> CS use the same queue.capacity to specify both percentage-based or 
> absolute-resource-based configs (Similar to how FS deal with maximum 
> Resource).
>  The capacity defines guaranteed resource, which also impact fairshare of the 
> queue. (The more guaranteed resource a queue has, the larger "pie" the queue 
> can get if there's any additional available resource).
>  In FS, minResource defined the guaranteed resource, and weight defined how 
> much the pie can grow to.
> So to me, in FS, we should pick and choose either weight or minResource to 
> generate CS.
> 3)
> In FS, mix-use of absolute-resource configs (like min/maxResource), and 
> percentage-based (like weight) is allowed. But in CS, it is not allowed. The 
> reason is discussed on YARN-5881, and find [a]Should we support specifying a 
> mix of percentage ...
> The existing fs2cs doesn't handle the issue, which could set mixed absolute 
> resource and percentage-based resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10179) Queue mapping based on group id passed through application tag

2020-03-04 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051239#comment-17051239
 ] 

Sunil G commented on YARN-10179:


Thanks [~snemeth] for raising this.

> Queue mapping based on group id passed through application tag
> --
>
> Key: YARN-10179
> URL: https://issues.apache.org/jira/browse/YARN-10179
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0
>
>
> There are situations when the real submitting user differs from the user what 
> arrives to YARN. For example in case of a Hive application when Hive 
> impersonation is turned off, the hive queries will run as Hive user and the 
> mapping is done based on the user's group. 
> Unfortunately in this case YARN doesn't have any information about the real 
> user and there are cases when the customer may want to map these applications 
> to the real submitting user's queue (based on the group id) instead of the 
> Hive queue.
> For these cases, if they would pass the group id (or name) in the application 
> tag we may read it and use it during the queue mapping, if that user has 
> rights to run on the real user's queue.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10167) FS-CS Converter: Need validate c-s.xml after converting

2020-02-27 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046553#comment-17046553
 ] 

Sunil G commented on YARN-10167:


[~pbacsko] thanks for pointing out the Cluster Down scenario. I missed that.

If the fs2cs tool can bringup a CS instance, then its much better. FYI, 
[~kmarton] has done similar effort for YARN validate mutation api call. So 
similar code will help here.

Thoughts?

> FS-CS Converter: Need validate c-s.xml after converting
> ---
>
> Key: YARN-10167
> URL: https://issues.apache.org/jira/browse/YARN-10167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Major
>  Labels: fs2cs, newbie
>
> Currently we just generated c-s.xml, but we didn't validate that. To make 
> sure the c-s.xml is correct after conversion, it's better to initialize the 
> CS scheduler using configs.
> Also, in the test, we should try to leverage MockRM to validate generated 
> configs as much as we could.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10167) FS-CS Converter: Need validate c-s.xml after converting

2020-02-26 Thread Sunil G (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046238#comment-17046238
 ] 

Sunil G commented on YARN-10167:


Trying to understand deeper level here
 * Submit a new c-s.xml from CLI  for validation (may be cli may look like: 
*yarn queues validate -f c-s.xml*)
 * CLI uploads the c-s.xml as a config object to RM
 * In RM, we will be having a method in ClientRMService called 
*validateCSConfig*
 * In this method, we will be able to get a new CS object itself, and inject 
the incoming conf object. 
 ** 1. If CS can be inited, validation seems fine (similar to what [~kmarton] 
did earlier)
 ** 2. Can we start CS also ?? I dont think so. In this case, we may need 
MockRM instead of CS object. But question here is that, what all configs again 
get validated more when we call start. If this delta is minimum, I would like 
to pull that out and call explicitly. cc [~kmarton] and [~prabhujoseph] can add 
more color here.
 ** If both 1 & 2 can be done, then we can say that the submitted config is 
good or not.
 * Propagate success or failure back to CLI

[~leftnoteasy] [~kmarton] cud u pls share ur thoughts whether this is what 
intended?

cc. [~pbacsko] [~snemeth] [~prabhujoseph]

> FS-CS Converter: Need validate c-s.xml after converting
> ---
>
> Key: YARN-10167
> URL: https://issues.apache.org/jira/browse/YARN-10167
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Priority: Major
>  Labels: fs2cs, newbie
>
> Currently we just generated c-s.xml, but we didn't validate that. To make 
> sure the c-s.xml is correct after conversion, it's better to initialize the 
> CS scheduler using configs.
> Also, in the test, we should try to leverage MockRM to validate generated 
> configs as much as we could.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >