[jira] [Commented] (YARN-10425) Replace the legacy placement engine in CS with the new one

2020-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223833#comment-17223833
 ] 

Hadoop QA commented on YARN-10425:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
22s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 9 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
54s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 58s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
55s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 34s{color} | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/276/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 7 new + 301 unchanged - 8 fixed = 308 total (was 309) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 23s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} |  | {color:green} the patch passed with JDK Private 

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223831#comment-17223831
 ] 

Hadoop QA commented on YARN-10475:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} |  | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
18s{color} |  | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
34s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
56s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
21s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
42s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 22s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} |  | {color:green} branch-3.2 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
57s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} |  | {color:blue} 
branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site no findbugs output file 
(findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} |  | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} |  | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 15m  
3s{color} | 
[/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt]
 | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 15m  3s{color} 
| 
[/patch-compile-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-compile-root.txt]
 | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} blanks {color} | {color:red}  0m  
0s{color} | 
[/blanks-eol.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/blanks-eol.txt]
 | {color:red} The patch has 1 line(s) that end in blanks. Use git apply 
--whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | 
[/buildtool-patch-checkstyle-root.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/buildtool-patch-checkstyle-root.txt]
 | {color:orange} The patch fails to run checkstyle in root {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
35s{color} | 
[/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt]
 | {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
19s{color} | 
[/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/277/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt]
 | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
34s{color} | 

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223818#comment-17223818
 ] 

Eric Payne commented on YARN-10475:
---

Thanks [~Jim_Brennan] for providing resolutions for this issue, and thanks 
[~bibinchundatt] for your reviews.
The changes LGTM.

+1

I am in favor of committing this patch as-is and creating a separate JIRA for 
adding a plug-able architecture for adjusting the heartbeat based on other 
factors.

[~bibinchundatt], I await your opinion.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10442) RM should make sure node label file highly available

2020-10-30 Thread Hemanth Boyina (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223811#comment-17223811
 ] 

Hemanth Boyina commented on YARN-10442:
---

thanks [~surendrasingh] for the contribution

committed to trunk

> RM should make sure node label file highly available
> 
>
> Key: YARN-10442
> URL: https://issues.apache.org/jira/browse/YARN-10442
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: 3.4.0
>
>
> In one of my cluster RM failed transition to Active because node label file 
> blocks are missing. I think RM should make sure important files are highly 
> available . 
> {noformat}
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Could not 
> obtain block: BP-2121803626-10.0.0.22-1597301807397:blk_1073832522_91774 
> file=/yarn/node-labels/nodelabel.mirror
>   at 
> com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:238)
>   at 
> com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
>   at 
> com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
>   at 
> com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerServiceProtos$AddToClusterNodeLabelsRequestProto.parseDelimitedFrom(YarnServerResourceManagerServiceProtos.java:7493)
>   at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:168)
>   at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:205)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:254)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:268)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)(AbstractService.java:194){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-10458:
--
Fix Version/s: 3.4.0

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch, YARN-10458-004.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223771#comment-17223771
 ] 

Wangda Tan commented on YARN-10458:
---

I just committed the patch to trunk, thanks [~anand.srinivasan] for reporting 
this issue and thanks [~pbacsko] for working on the patch.

[~pbacsko] can you help to backport to corresponding branches? 

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch, YARN-10458-004.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223764#comment-17223764
 ] 

Hadoop QA commented on YARN-10458:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
25s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 48s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
10s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  4s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} |  | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m  5s{color} 
| 

[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223735#comment-17223735
 ] 

Wangda Tan commented on YARN-10458:
---

+1, thanks [~pbacsko], will get it committed later today. 

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch, YARN-10458-004.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10475:
---
Attachment: YARN-10475-branch-3.2.003.patch

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223733#comment-17223733
 ] 

Jim Brennan commented on YARN-10475:


[~epayne], I have put up patches for branch-3.3 and branch-3.2 as well.

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10475:
---
Attachment: YARN-10475-branch-3.3.003.patch

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10425) Replace the legacy placement engine in CS with the new one

2020-10-30 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223708#comment-17223708
 ] 

Peter Bacsko commented on YARN-10425:
-

Thanks for the explanation [~shuzirra], I'll do another round on Monday, I 
guess it will be OK and we can commit to trunk.

> Replace the legacy placement engine in CS with the new one
> --
>
> Key: YARN-10425
> URL: https://issues.apache.org/jira/browse/YARN-10425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10425.001.patch, YARN-10425.002.patch, 
> YARN-10425.003.patch, YARN-10425.004.patch, YARN-10425.005.patch
>
>
> Remove the UserGroupMapping and ApplicationName mapping classes, and use the 
> new CSMappingPlacementRule instead. Also cleanup the orphan classes which are 
> used by these classes only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10425) Replace the legacy placement engine in CS with the new one

2020-10-30 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223701#comment-17223701
 ] 

Gergely Pollak commented on YARN-10425:
---

Thank you for the review [~pbacsko], [~BilwaST] and [~snemeth], I've uploaded 
the newest patch with the asked changes, please find my answers below:

[~pbacsko] 
1. Nits:
instead of directly using the constructor new Groups(conf), you might want to 
use Groups.getUserToGroupsMappingServiceWithLoadedConfiguration()

Actually I cannot, because some tests fail, if I don't create a new instance 
each time, since the Groups class caches the value of 
HADOOP_SECURITY_GROUP_MAPPING, which is changed by multiple tests.
I tried to explain it in the comment above the line

2. I think the condition below should not be allowed. If, for whatever reason 
we couldn't retrieve the groups service provider, that is a serious error and 
we shouldn't proceed any further.
What is the rationale behind this change?
3. Same here, don't catch this if we know that the error is group-related:

Groups errors are not fatal, we can proceed if a group cannot be determined for 
a user, the rationale behind adding this less strict version is, we evaluate 
groups early, even if there is no group related rule, or the actual app 
submission would not hit any group related rule. If no group is found, 
placement will fail, for group related placements anyway, which will get to the 
REJECT for legacy rules, so for legacy we have the same behavior, new format 
users can determine if they want to fail on group errors. Actually if I threw 
an exception at that point, we's loose backwards compatibility, as multiple 
tests pointed out.

[~BilwaST] 
I'm relying on the interface specification, it does not specify I cannot 
receive null at this point, os I check for it, better safe than sorry. 
Receiving null is no reason to fail here, since we can handle it properly, so I 
feel safer if I check for it. (Also please note that some tests mock the hell 
out of the RM engines, and you might get nulls at surprising places.)

 

> Replace the legacy placement engine in CS with the new one
> --
>
> Key: YARN-10425
> URL: https://issues.apache.org/jira/browse/YARN-10425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10425.001.patch, YARN-10425.002.patch, 
> YARN-10425.003.patch, YARN-10425.004.patch, YARN-10425.005.patch
>
>
> Remove the UserGroupMapping and ApplicationName mapping classes, and use the 
> new CSMappingPlacementRule instead. Also cleanup the orphan classes which are 
> used by these classes only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10425) Replace the legacy placement engine in CS with the new one

2020-10-30 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10425:
--
Attachment: YARN-10425.005.patch

> Replace the legacy placement engine in CS with the new one
> --
>
> Key: YARN-10425
> URL: https://issues.apache.org/jira/browse/YARN-10425
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10425.001.patch, YARN-10425.002.patch, 
> YARN-10425.003.patch, YARN-10425.004.patch, YARN-10425.005.patch
>
>
> Remove the UserGroupMapping and ApplicationName mapping classes, and use the 
> new CSMappingPlacementRule instead. Also cleanup the orphan classes which are 
> used by these classes only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223683#comment-17223683
 ] 

Peter Bacsko commented on YARN-10458:
-

[~leftnoteasy] please review  patch v4, it should be good as a final version.

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch, YARN-10458-004.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223668#comment-17223668
 ] 

Jim Brennan commented on YARN-10475:


Thanks for the suggestion [~bibinchundatt]!  I think a plugin for calculating 
the heartbeat interval is definitely possible.  The configs as specified I 
think could remain for enabling scaling and setting up the parameters - there 
is nothing specific about cpu utilization in those properties.  Would you be ok 
with a follow-up Jira to move the calculation into a plugin?  Do you have any 
suggestions for alternate calculations?


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10458:

Attachment: YARN-10458-004.patch

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch, YARN-10458-004.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223657#comment-17223657
 ] 

Hadoop QA commented on YARN-10458:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
17s{color} |  | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} |  | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch does not contain any @author tags. 
{color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} |  | {color:green} The patch appears to include 3 new or modified 
test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
58s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} |  | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 36s{color} |  | {color:green} branch has no errors when building and 
testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} |  | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} |  | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
46s{color} |  | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} |  | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} |  | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} blanks {color} | {color:green}  0m  
0s{color} |  | {color:green} The patch has no blanks issues. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/274/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt]
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 203 unchanged - 0 fixed = 204 total (was 203) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} |  | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 18s{color} |  | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} |  | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} |  | {color:green} the patch passed with JDK Private 

[jira] [Commented] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223656#comment-17223656
 ] 

Jim Brennan commented on YARN-10471:


[~epayne] I agree we don't need to go to branch-3.1 nor branch-2.10.
Thanks for the contribution!


> Prevent logs for any container from becoming larger than a configurable size.
> -
>
> Key: YARN-10471
> URL: https://issues.apache.org/jira/browse/YARN-10471
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1, 3.1.4
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 3.2.2, 3.3.1, 3.4.1, 3.2.3
>
> Attachments: YARN.10471.001.patch, YARN.10471.002.patch, 
> YARN.10471.003.patch, YARN.10471.004.patch, YARN.10471.005.patch, 
> YARN.10471.branch-3.2.003.patch, YARN.10471.branch-3.2.005.patch
>
>
> Configure a cluster such that a task attempt will be killed if any container 
> log exceeds a configured size. This would help prevent logs from filling 
> disks and also prevent the need to aggregate enormous logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10458:

Attachment: YARN-10458-003.patch

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10458) Hive On Tez queries fails upon submission to dynamically created pools

2020-10-30 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223564#comment-17223564
 ] 

Peter Bacsko commented on YARN-10458:
-

Many thanks [~wangda]. The test works with the proposed changes. Uploaded patch 
v3.

> Hive On Tez queries fails upon submission to dynamically created pools
> --
>
> Key: YARN-10458
> URL: https://issues.apache.org/jira/browse/YARN-10458
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anand Srinivasan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10458-001.patch, YARN-10458-002.patch, 
> YARN-10458-003.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;*<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10472) Backport YARN-10314 to branch-3.2

2020-10-30 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated YARN-10472:
---
Fix Version/s: 3.2.2

cherry-pick to branch-3.2.2 and verify at local, Thanks [~smeng].

> Backport YARN-10314 to branch-3.2
> -
>
> Key: YARN-10472
> URL: https://issues.apache.org/jira/browse/YARN-10472
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Blocker
> Fix For: 3.2.2, 3.2.3
>
>
> Filing this jira to raise the following concern:
> YARN-10314 fixes a problem with the shaded jars in 3.3.0. But it is not 
> backported to branch-3.2 yet. [~weichiu] and I ([~smeng]) are looking into 
> this.
> I have submitted a PR on branch-3.2: 
> https://github.com/apache/hadoop/pull/2412
> CC [~hexiaoqiao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-30 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated YARN-10471:
---
Fix Version/s: 3.2.2

cherry-pick to branch-3.2.2 and verify at local, Thanks [~Jim_Brennan] and 
[~epayne].

> Prevent logs for any container from becoming larger than a configurable size.
> -
>
> Key: YARN-10471
> URL: https://issues.apache.org/jira/browse/YARN-10471
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.1, 3.1.4
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 3.2.2, 3.3.1, 3.4.1, 3.2.3
>
> Attachments: YARN.10471.001.patch, YARN.10471.002.patch, 
> YARN.10471.003.patch, YARN.10471.004.patch, YARN.10471.005.patch, 
> YARN.10471.branch-3.2.003.patch, YARN.10471.branch-3.2.005.patch
>
>
> Configure a cluster such that a task attempt will be killed if any container 
> log exceeds a configured size. This would help prevent logs from filling 
> disks and also prevent the need to aggregate enormous logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10467) ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers

2020-10-30 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated YARN-10467:
---
Fix Version/s: 3.2.2

cherry-pick to branch-3.2.2 and verify at local, Thanks all.

> ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers
> -
>
> Key: YARN-10467
> URL: https://issues.apache.org/jira/browse/YARN-10467
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.10.0, 3.0.3, 3.2.1, 3.1.4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: YARN-10467.00.patch, YARN-10467.01.patch, 
> YARN-10467.02.patch, YARN-10467.branch-2.10.00.patch, 
> YARN-10467.branch-2.10.01.patch, YARN-10467.branch-2.10.02.patch, 
> YARN-10467.branch-2.10.03.patch
>
>
> In one of our recent heap analysis, we found that the majority of the heap is 
> occupied by {{RMNodeImpl.completedContainers}}, which 
> accounts for 19GB, out of 24.3 GB.  There are over 86 million 
> ContainerIdPBImpl objects, in contrast, only 161,601 RMContainerImpl objects 
> which represent the # of active containers that RM is still tracking.  
> Inspecting some ContainerIdPBImpl objects, they belong to applications that 
> have long finished. This indicates some sort of memory leak of 
> ContainerIdPBImpl objects in RMNodeImpl.
>  
> Right now, when a container is reported by a NM as completed, it is 
> immediately added to RMNodeImpl.completedContainers and later cleaned up 
> after the AM has been notified of its completion in the AM-RM heartbeat. The 
> cleanup can be broken into a few steps.
>  * Step 1:  the completed container is first added to 
> RMAppAttemptImpl.justFinishedContainers (this is asynchronous to being added 
> to {{RMNodeImpl.completedContainers}}).
>  * Step 2: During the heartbeat AM-RM heartbeat, the container is removed 
> from RMAppAttemptImpl.justFinishedContainers and added to 
> RMAppAttemptImpl.finishedContainersSentToAM
> Once a completed container gets added to 
> RMAppAttemptImpl.finishedContainersSentToAM, it is guaranteed to be cleaned 
> up from {{RMNodeImpl.completedContainers}}
>  
> However, if the AM exits (regardless of failure or success) before some 
> recently completed containers can be added to  
> RMAppAttemptImpl.finishedContainersSentToAM in previous heartbeats, there 
> won’t be any future AM-RM heartbeat to perform aforementioned step 2. Hence, 
> these objects stay in RMNodeImpl.completedContainers forever.
> We have observed in MR that AMs can decide to exit upon success of all it 
> tasks without waiting for notification of the completion of every container, 
> or AM may just die suddenly (e.g. OOM).  Spark and other framework may just 
> be similar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-30 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223510#comment-17223510
 ] 

Xiaoqiao He commented on YARN-10450:


commit this to branch-3.2.2. Thanks [~ebadger] and [~Jim_Brennan].

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, 
> YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, 
> YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2020-10-30 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated YARN-10450:
---
Fix Version/s: 3.2.2

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, 
> YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, 
> YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7651) branch-2 application master (MR) cannot run in 3.1 cluster

2020-10-30 Thread chan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223502#comment-17223502
 ] 

chan commented on YARN-7651:


hey [~sunilg],i think your cluster has multi version jar,you can make all node 
in same version

> branch-2 application master (MR) cannot run in 3.1 cluster
> --
>
> Key: YARN-7651
> URL: https://issues.apache.org/jira/browse/YARN-7651
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Priority: Blocker
>
> {noformat}
> 2017-12-13 19:21:20,452 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-12-13 19:21:20,481 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:253)
> at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:219)
> at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:211)
> at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:876)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1571)
> Caused by: java.io.IOException: Exception reading 
> /Users/sunilgovindan/install/hadoop/tmp/nm-local-dir/usercache/sunilgovindan/appcache/application_1513172966925_0001/container_1513172966925_0001_01_01/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:870)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
> at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:251)
> ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> ... 8 more
> 2017-12-13 19:21:20,484 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10440) resource manager hangs,and i cannot submit any new jobs,but rm and nm processes are normal

2020-10-30 Thread chan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223495#comment-17223495
 ] 

chan commented on YARN-10440:
-

@[~Jufeng]  i had ever met this problem and i set the config,hope to help you!
{code:java}
//代码占位符
  
 
yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments
1
 
  
 
yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled
false
 
{code}
 

> resource manager hangs,and i cannot submit any new jobs,but rm and nm 
> processes are normal
> --
>
> Key: YARN-10440
> URL: https://issues.apache.org/jira/browse/YARN-10440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.1
>Reporter: jufeng li
>Priority: Blocker
> Attachments: rm_2020-09-26-2.dump
>
>
> RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. 
> I can open  x:8088/cluster/apps/RUNNING but can not 
> x:8088/cluster/scheduler.Those apps submited can not end itself and new 
> apps can not be submited.just everything hangs but not RM,NM server. How can 
> I fix this?help me,please!
>  
> here is the log:
> {code:java}
> ttempt=appattempt_1600074574138_66297_01 container=null queue=tianqiwang 
> clusterResource= type=NODE_LOCAL 
> requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,679 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation 
> proposal
> 2020-09-17 00:22:25,680 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - 
> assignedContainer application attempt=appattempt_1600074574138_66297_01 
> container=null queue=tianqiwang clusterResource= vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:tryCommit(2906)) - Failed to accept 

[jira] [Comment Edited] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223491#comment-17223491
 ] 

Bibin Chundatt edited comment on YARN-10475 at 10/30/20, 8:36 AM:
--

Thank you  [~Jim_Brennan]  working on this.  

Could you make the implementation generics to plugin other policies too. Cpu 
utlization could be one of the policy which helps in deciding the HB interval. 
thoughts?



was (Author: bibinchundatt):
Thank you  [~Jim_Brennan]  working on this.  

Could you make the implementation generics to plugin other policies too. Cpu 
utlization only of the policy which helps in deciding the HB interval. thoughts?


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2020-10-30 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223491#comment-17223491
 ] 

Bibin Chundatt commented on YARN-10475:
---

Thank you  [~Jim_Brennan]  working on this.  

Could you make the implementation generics to plugin other policies too. Cpu 
utlization only of the policy which helps in deciding the HB interval. thoughts?


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10472) Backport YARN-10314 to branch-3.2

2020-10-30 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17223431#comment-17223431
 ] 

Xiaoqiao He commented on YARN-10472:


Thanks [~smeng], I would like to do cherry-pick them together shortly. Thanks

> Backport YARN-10314 to branch-3.2
> -
>
> Key: YARN-10472
> URL: https://issues.apache.org/jira/browse/YARN-10472
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.2
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Blocker
> Fix For: 3.2.3
>
>
> Filing this jira to raise the following concern:
> YARN-10314 fixes a problem with the shaded jars in 3.3.0. But it is not 
> backported to branch-3.2 yet. [~weichiu] and I ([~smeng]) are looking into 
> this.
> I have submitted a PR on branch-3.2: 
> https://github.com/apache/hadoop/pull/2412
> CC [~hexiaoqiao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org