[jira] [Updated] (YARN-11057) NodeManager may generate too many empty log dirs when we configure many log dirs

2022-01-05 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated YARN-11057:

Target Version/s: 3.3.3  (was: 3.3.2)

> NodeManager may generate too many empty log dirs when we configure many log 
> dirs
> 
>
> Key: YARN-11057
> URL: https://issues.apache.org/jira/browse/YARN-11057
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.7, 3.3.1
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Major
> Attachments: YARN-11507.0001.patch
>
>
>   NodeManager may generate too many empty log dirs when we configure many log 
> dirs in NonAggregationLogHandler mode.For example: We have 24 disks, 512G 
> memory,hypothesis that average time cost is 1 min for every container  and 
> average container's size is 4g.Then parallel running containers in one server 
> are 512G / 4G = 128. Every container will generate more than 24 directories 
> in current policy.Then total directories in one week is 128 * 24 * (60 * 24 * 
> 7) = 30 965 760 .This is not conside the tmp directories. Which will consume 
> too many inods in server and affect the disk's io utils.This is because so 
> many inodes will consume too many memory cached in linux.When the memory is 
> not insufficience the cached inodes will remove from the memory.Which will 
> increase the incidence of scan disk and the disk io utils will become 
> high.Actually, this directories only one is used for container logs for every 
> container. The others is empty.So we can delete the empty directories when 
> the job is finished.Which will reduce too many inodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11058) Yarn ACL check is not done for /containers/{containerid}/logs in HsWebServices

2022-01-05 Thread Tanu Ajmera (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469333#comment-17469333
 ] 

Tanu Ajmera commented on YARN-11058:


cc [~tarunparimi] [~snemeth] [~gandras] 

> Yarn ACL check is not done for /containers/{containerid}/logs in HsWebServices
> --
>
> Key: YARN-11058
> URL: https://issues.apache.org/jira/browse/YARN-11058
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
>
> In API /jobhistory/logsuser, 
> ACL check is done and other users cannot view logs. In HsWebServices API, ACL 
> check is missing allowing users to view logs of applications created by 
> different users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11058) Yarn ACL check is not done for /containers/{containerid}/logs in HsWebServices

2022-01-05 Thread Tanu Ajmera (Jira)
Tanu Ajmera created YARN-11058:
--

 Summary: Yarn ACL check is not done for 
/containers/{containerid}/logs in HsWebServices
 Key: YARN-11058
 URL: https://issues.apache.org/jira/browse/YARN-11058
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Tanu Ajmera
Assignee: Tanu Ajmera


In API /jobhistory/logsuser, ACL 
check is done and other users cannot view logs. In HsWebServices API, ACL check 
is missing allowing users to view logs of applications created by different 
users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10918:
--
Labels: pull-request-available  (was: )

> Simplify method: CapacitySchedulerQueueManager#parseQueue
> -
>
> Key: YARN-10918
> URL: https://issues.apache.org/jira/browse/YARN-10918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ideas for simplifying this method:
> - Define a queue factory
> - Separate validation logic



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10632) Make auto queue creation maximum allowed depth configurable

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10632:
--
Labels: pull-request-available  (was: )

> Make auto queue creation maximum allowed depth configurable
> ---
>
> Key: YARN-10632
> URL: https://issues.apache.org/jira/browse/YARN-10632
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10632.001.patch, YARN-10632.002.patch, 
> YARN-10632.003.patch, YARN-10632.004.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the max depth allowed are fixed to 2. But i think this should be 
> configurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue

2022-01-05 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17469139#comment-17469139
 ] 

Andras Gyori commented on YARN-10918:
-

I think the queue creation itself does not justify a separate queue factory (as 
its literally only construction of the queues). There are only 2 validations 
here, so I thought we should keep this as simple as possible.

> Simplify method: CapacitySchedulerQueueManager#parseQueue
> -
>
> Key: YARN-10918
> URL: https://issues.apache.org/jira/browse/YARN-10918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>
> Ideas for simplifying this method:
> - Define a queue factory
> - Separate validation logic



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10922) Investigation: Verify if legacy AQC works as documented

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10922:
--
Labels: pull-request-available  (was: )

> Investigation: Verify if legacy AQC works as documented
> ---
>
> Key: YARN-10922
> URL: https://issues.apache.org/jira/browse/YARN-10922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Attachments: capacity-scheduler.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Quoting from the Capacity Scheduler documentation: 
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> Section: "Dynamic Auto-Creation and Management of Leaf Queues"
> The task is to verify if legacy AQC works like this: 
> {quote}
> The parent queue which has been enabled for auto leaf queue creation, 
> supports the configuration of template parameters for automatic configuration 
> of the auto-created leaf queues. The auto-created queues support all of the 
> leaf queue configuration parameters except for Queue ACL, Absolute Resource 
> configurations. Queue ACLs are currently inherited from the parent queue i.e 
> they are not configurable on the leaf queue template
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10922) Investigation: Verify if legacy AQC works as documented

2022-01-05 Thread Tamas Domok (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468669#comment-17468669
 ] 

Tamas Domok edited comment on YARN-10922 at 1/5/22, 8:50 AM:
-

A pull request is on the way with a unit test for the QueueACL behaviour.

 

I was wrong on both assumption on the first comment.

 - Absolute resource can be configured for dynamically created queues with 
Legacy AQC, see the new test case in (YARN-11033). It is possible to configure 
the parent with Absolute resource and the child with Percentage mode using the 
leaf-queue-template, that leads to this bug (YARN-11010). 

 - QueueACL are partially supported, they can be configured for dynamically 
created queues with Legacy AQC leaf-queue-template. If a dynamically created 
queue exists already, then the ACLs configured with the leaf-queue-template 
applies for that queue. However at queue creation those ACLs are not in effect, 
that's why I call it partial support. Also the scheduler response doesn't show 
the ACLs in auto created queue's configuration, which is misleading.

Absolute Resource:

  Question#1: Should I report a bug for a missing config validation for the 
Absolute+Percentage mix?

QueueACL:

  Question#2-a: Should the documentation be updated with the QueueACLs current 
behaviour? IMHO it's hard to explain and I don't think this behaviour was 
intended.

  Question#2-b: Should I report a bug to eliminate this partial support for 
QueueACL? The new Flexible Auto Queue Creation doesn't support it either, that 
works as it is documented.

  Question#3b: Should I report bugs to improve the QueueACL support on the 
Legacy AQC? One for the queue creation and one for the scheduler response? IMHO 
if this is a Legacy feature then it would make sense to introduce this feature 
in the Flexible AQC instead.


was (Author: tdomok):
A pull request is on the way with doc update and a unit test for the QueueACL 
behaviour.

 

I was wrong on both assumption on the first comment.

 - Absolute resource can be configured for dynamically created queues with 
Legacy AQC, see the new test case in (YARN-11033 isAbsoluteResource is not 
correct for dynamically created queues). It is possible to configure the parent 
with Absolute resource and the child with Percentage mode using the 
leaf-queue-template, that leads to this bug (YARN-11010).

 - QueueACL can be configured for dynamically created queues with Legacy AQC 
leaf-queue-template. If a dynamically created queue exists already, then the 
ACLs configured with the leaf-queue-template applies for that queue. However at 
queue creation those ACLs are not in effect, and the scheduler response doesn't 
show the ACLs in the dynamic queue.

 

Probably I should open 2 more issue:

 1. ACLs are not correctly filled for the dynamically created queue.

 2. It shouldn't be possible to configure Absolute parent with Percentage child 
using the leaf-queue-template.

> Investigation: Verify if legacy AQC works as documented
> ---
>
> Key: YARN-10922
> URL: https://issues.apache.org/jira/browse/YARN-10922
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Attachments: capacity-scheduler.xml
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Quoting from the Capacity Scheduler documentation: 
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> Section: "Dynamic Auto-Creation and Management of Leaf Queues"
> The task is to verify if legacy AQC works like this: 
> {quote}
> The parent queue which has been enabled for auto leaf queue creation, 
> supports the configuration of template parameters for automatic configuration 
> of the auto-created leaf queues. The auto-created queues support all of the 
> leaf queue configuration parameters except for Queue ACL, Absolute Resource 
> configurations. Queue ACLs are currently inherited from the parent queue i.e 
> they are not configurable on the leaf queue template
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org