[jira] [Commented] (YARN-9903) Support reservations continue looking for Node Labels

2020-06-03 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125358#comment-17125358
 ] 

Jim Brennan commented on YARN-9903:
---

Thanks [~epayne]!  I will work on adding a unit test for this.


> Support reservations continue looking for Node Labels
> -
>
> Key: YARN-9903
> URL: https://issues.apache.org/jira/browse/YARN-9903
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tarun Parimi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9903.001.patch, YARN-9903.002.patch
>
>
> YARN-1769 brought in reservations continue looking feature which improves the 
> several resource reservation scenarios. However, it is not handled currently 
> when nodes have a label assigned to them. This is useful and in many cases 
> necessary even for Node Labels. So we should look to support this for node 
> labels also.
> For example, in AbstractCSQueue.java, we have the below TODO.
> {code:java}
> // TODO, now only consider reservation cases when the node has no label 
> if (this.reservationsContinueLooking && nodePartition.equals( 
> RMNodeLabelsManager.NO_LABEL) && Resources.greaterThan( resourceCalculator, 
> clusterResource, resourceCouldBeUnreserved, Resources.none())) {
> {code}
> cc [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9903) Support reservations continue looking for Node Labels

2020-06-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125351#comment-17125351
 ] 

Eric Payne commented on YARN-9903:
--

[~Jim_Brennan], The code changes in the latest patch look good. Can you please 
provide a unit test?

I think the test failure for TestFairSchedulerPreemption is unrelated. It 
sometimes succeeds and sometimes fails for me in my local environment.

> Support reservations continue looking for Node Labels
> -
>
> Key: YARN-9903
> URL: https://issues.apache.org/jira/browse/YARN-9903
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tarun Parimi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9903.001.patch, YARN-9903.002.patch
>
>
> YARN-1769 brought in reservations continue looking feature which improves the 
> several resource reservation scenarios. However, it is not handled currently 
> when nodes have a label assigned to them. This is useful and in many cases 
> necessary even for Node Labels. So we should look to support this for node 
> labels also.
> For example, in AbstractCSQueue.java, we have the below TODO.
> {code:java}
> // TODO, now only consider reservation cases when the node has no label 
> if (this.reservationsContinueLooking && nodePartition.equals( 
> RMNodeLabelsManager.NO_LABEL) && Resources.greaterThan( resourceCalculator, 
> clusterResource, resourceCouldBeUnreserved, Resources.none())) {
> {code}
> cc [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

2020-06-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125342#comment-17125342
 ] 

Eric Payne commented on YARN-10283:
---

bq. Please feel free to pull in the additional changes from YARN-9903 into this 
patch
So, IMHO, I think we should make this JIRA (YARN-10283) dependent on YARN-9903 
and complete YARN-9903 first. IIUC, YARN-9903 is addressing the general case of 
reservation starvation whereas this JIRA is specific to the concerns of 
priority queues. Even with the fixes in YARN-9903, there are still 
priority-queue-specific problems that need to be addressed.

bq. If there are no node labels, the same allocation errors occur if 
reservationsContinueLooking == false AND minimum-allocation-mb == 512.

I verified that when YARN-9903 is applied, reproTestWithNodeLabels succeeds but 
reproWithoutNodeLabels still fails.

> Capacity Scheduler: starvation occurs if a higher priority queue is full and 
> node labels are used
> -
>
> Key: YARN-10283
> URL: https://issues.apache.org/jira/browse/YARN-10283
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, 
> YARN-10283-ReproTest2.patch
>
>
> Recently we've been investigating a scenario where applications submitted to 
> a lower priority queue could not get scheduled because a higher priority 
> queue in the same hierarchy could now satisfy the allocation request. Both 
> queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a 
> container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were 
> added to the partition
> * Both queues have a limit of 
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches 
> "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per 
> container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage  vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the 
> partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller 
> than the current limit resource 
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an 
> allocated container for 
> 4. But we can't commit the resource request because we would have 9 vcores in 
> total, violating the limit.
> The problem is that we always try to assign container for the same 
> application in each heartbeat from "highprio". Applications in "lowprio" 
> cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case 
> well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with 
> the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10292) FS-CS converter: add an option to enable asynchronous scheduling in CapacityScheduler

2020-06-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125192#comment-17125192
 ] 

Hadoop QA commented on YARN-10292:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
39s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 22 new + 1 unchanged - 0 fixed = 23 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m  5s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26109/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004735/YARN-10292.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 82af9a8b8f72 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | 

[jira] [Comment Edited] (YARN-10300) appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat

2020-06-03 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125139#comment-17125139
 ] 

Eric Badger edited comment on YARN-10300 at 6/3/20, 5:09 PM:
-

Thanks for the review, [~Jim_Brennan]! [~epayne], would you mind looking at 
this patch as well to give a binding review?


was (Author: ebadger):
[~epayne], would you mind looking at this patch as well to give a binding 
review?

> appMasterHost not set in RM ApplicationSummary when AM fails before first 
> heartbeat
> ---
>
> Key: YARN-10300
> URL: https://issues.apache.org/jira/browse/YARN-10300
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10300.001.patch, YARN-10300.002.patch
>
>
> {noformat}
> 2020-05-23 14:09:10,086 INFO resourcemanager.RMAppManager$ApplicationSummary: 
> appId=application_1586003420099_12444961,name=job_name,user=username,queue=queuename,state=FAILED,trackingUrl=https
>  
> ://cluster:port/applicationhistory/app/application_1586003420099_12444961,appMasterHost=N/A,startTime=1590241207309,finishTime=1590242950085,finalStatus=FAILED,memorySeconds=13750,vcoreSeconds=67,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=MAPREDUCE
> {noformat}
> {{appMasterHost=N/A}} should have the AM hostname instead of N/A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10300) appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat

2020-06-03 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125139#comment-17125139
 ] 

Eric Badger commented on YARN-10300:


[~epayne], would you mind looking at this patch as well to give a binding 
review?

> appMasterHost not set in RM ApplicationSummary when AM fails before first 
> heartbeat
> ---
>
> Key: YARN-10300
> URL: https://issues.apache.org/jira/browse/YARN-10300
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10300.001.patch, YARN-10300.002.patch
>
>
> {noformat}
> 2020-05-23 14:09:10,086 INFO resourcemanager.RMAppManager$ApplicationSummary: 
> appId=application_1586003420099_12444961,name=job_name,user=username,queue=queuename,state=FAILED,trackingUrl=https
>  
> ://cluster:port/applicationhistory/app/application_1586003420099_12444961,appMasterHost=N/A,startTime=1590241207309,finishTime=1590242950085,finalStatus=FAILED,memorySeconds=13750,vcoreSeconds=67,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=MAPREDUCE
> {noformat}
> {{appMasterHost=N/A}} should have the AM hostname instead of N/A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9903) Support reservations continue looking for Node Labels

2020-06-03 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125136#comment-17125136
 ] 

Eric Payne commented on YARN-9903:
--

As [~Jim_Brennan] pointed out, our usage of node labels is somewhat limited. On 
the one hand, we have a few clusters with node labels, and jobs are constantly 
being run in those queues. However, we don't use many features that are 
provided by node labels. For example, we only use exclusive node labels.

Having said that, I think that this feature is safe. I don't think the various 
node label features would have much of an effect on un-scheduling and 
re-scheduling reservations. And, as Jim says, we have been using this fix for a 
long time internally.

> Support reservations continue looking for Node Labels
> -
>
> Key: YARN-9903
> URL: https://issues.apache.org/jira/browse/YARN-9903
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tarun Parimi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9903.001.patch, YARN-9903.002.patch
>
>
> YARN-1769 brought in reservations continue looking feature which improves the 
> several resource reservation scenarios. However, it is not handled currently 
> when nodes have a label assigned to them. This is useful and in many cases 
> necessary even for Node Labels. So we should look to support this for node 
> labels also.
> For example, in AbstractCSQueue.java, we have the below TODO.
> {code:java}
> // TODO, now only consider reservation cases when the node has no label 
> if (this.reservationsContinueLooking && nodePartition.equals( 
> RMNodeLabelsManager.NO_LABEL) && Resources.greaterThan( resourceCalculator, 
> clusterResource, resourceCouldBeUnreserved, Resources.none())) {
> {code}
> cc [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10296) Make ContainerPBImpl#getId/setId synchronized

2020-06-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125132#comment-17125132
 ] 

Hadoop QA commented on YARN-10296:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
42s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
1s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26110/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10296 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004738/YARN-10296.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 15cf1951d602 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 97c98ce531c |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26110/testReport/ |
| Max. process+thread count | 421 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124932#comment-17124932
 ] 

Peter Bacsko edited comment on YARN-9930 at 6/3/20, 4:02 PM:
-

Attached POC v4 with a new unit test which verifies the new functionality. 
Still a POC because there are failing UTs.

TODO
* fix failing UTs (most likely  mocking should be fine tuned)
* fix checkstyle
* more tests for the new feature (eg. user limit exceeded)
* check the visibility of new methods (package private/public)
* check types/casts (the patch has probably too many of them)
* ensure proper naming
* add tests for CSMaxRunningAppsEnforcer (likely a copy-paste + edits from 
TestMaxRunningAppsEnforcer)


was (Author: pbacsko):
Attached POC v4 with a new unit test which verifies the new functionality. 
Still a POC because there are failing UTs.

TODO
* fix failing UTs (most likely  mocking should be fine tuned)
* fix checkstyle
* more tests for the new feature (eg. user limit exceeded)
* check  visibility of new methods
* add tests for CSMaxRunningAppsEnforcer (likely a copy-paste + edits from 
TestMaxRunningAppsEnforcer)

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10296) Make ContainerPBImpl#getId/setId synchronized

2020-06-03 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10296:
-
Attachment: YARN-10296.002.patch

> Make ContainerPBImpl#getId/setId synchronized
> -
>
> Key: YARN-10296
> URL: https://issues.apache.org/jira/browse/YARN-10296
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Minor
> Attachments: YARN-10296.001.patch, YARN-10296.002.patch
>
>
> ContainerPBImpl getId and setId methods can be accessed from multiple 
> threads. In order to avoid any simultaneous accesses and race conditions 
> these methods should be synchronized.
> The idea came from the issue described in YARN-10295, however that patch is 
> only applicable to branch-3.2 and 3.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125058#comment-17125058
 ] 

Hadoop QA commented on YARN-9930:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
47s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 24 new + 271 unchanged - 0 fixed = 295 total (was 271) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueStateManager |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueState |
|   | hadoop.yarn.server.resourcemanager.reservation.TestReservationSystem |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26108/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-9930 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004722/YARN-9930-POC04.patch 
|
| Optional Tests | 

[jira] [Updated] (YARN-10292) FS-CS converter: add an option to enable asynchronous scheduling in CapacityScheduler

2020-06-03 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10292:
-
Attachment: YARN-10292.001.patch

> FS-CS converter: add an option to enable asynchronous scheduling in 
> CapacityScheduler
> -
>
> Key: YARN-10292
> URL: https://issues.apache.org/jira/browse/YARN-10292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Attachments: YARN-10292.001.patch
>
>
> FS doesn't have an equivalent setting to the CapacityScheduler's 
> yarn.scheduler.capacity.schedule-asynchronously.enable option so the FS to CS 
> converter won't add this to the yarn-site.xml. An optional command line 
> switch should be added to support this option during migration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10300) appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat

2020-06-03 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124994#comment-17124994
 ] 

Jim Brennan commented on YARN-10300:


Thanks [~ebadger]!  Patch 002 looks good to me.  I verified it builds and the 
test passes.   +1 (non-binding).


> appMasterHost not set in RM ApplicationSummary when AM fails before first 
> heartbeat
> ---
>
> Key: YARN-10300
> URL: https://issues.apache.org/jira/browse/YARN-10300
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-10300.001.patch, YARN-10300.002.patch
>
>
> {noformat}
> 2020-05-23 14:09:10,086 INFO resourcemanager.RMAppManager$ApplicationSummary: 
> appId=application_1586003420099_12444961,name=job_name,user=username,queue=queuename,state=FAILED,trackingUrl=https
>  
> ://cluster:port/applicationhistory/app/application_1586003420099_12444961,appMasterHost=N/A,startTime=1590241207309,finishTime=1590242950085,finalStatus=FAILED,memorySeconds=13750,vcoreSeconds=67,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=MAPREDUCE
> {noformat}
> {{appMasterHost=N/A}} should have the AM hostname instead of N/A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9903) Support reservations continue looking for Node Labels

2020-06-03 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124985#comment-17124985
 ] 

Jim Brennan commented on YARN-9903:
---

We have been running with this patch in branch-2.8 for years on many large 
production clusters, so yes, it is safe.  We currently use node-labels 
sparingly.  I'm not sure if they were used more heavily in the past.  [~epayne] 
could speak to that.

Note that this patch does not currently have a test case.  I was about to start 
working on this and discovered this Jira and YARN-10283 and thought I should 
offer up what we have.


> Support reservations continue looking for Node Labels
> -
>
> Key: YARN-9903
> URL: https://issues.apache.org/jira/browse/YARN-9903
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tarun Parimi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9903.001.patch, YARN-9903.002.patch
>
>
> YARN-1769 brought in reservations continue looking feature which improves the 
> several resource reservation scenarios. However, it is not handled currently 
> when nodes have a label assigned to them. This is useful and in many cases 
> necessary even for Node Labels. So we should look to support this for node 
> labels also.
> For example, in AbstractCSQueue.java, we have the below TODO.
> {code:java}
> // TODO, now only consider reservation cases when the node has no label 
> if (this.reservationsContinueLooking && nodePartition.equals( 
> RMNodeLabelsManager.NO_LABEL) && Resources.greaterThan( resourceCalculator, 
> clusterResource, resourceCouldBeUnreserved, Resources.none())) {
> {code}
> cc [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-06-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124964#comment-17124964
 ] 

Hadoop QA commented on YARN-10274:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
41s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 64 unchanged - 0 fixed = 69 total (was 64) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
37s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26106/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004706/YARN-10274.003.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 5e80d7b95102 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 97c98ce531c |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 

[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124947#comment-17124947
 ] 

Peter Bacsko edited comment on YARN-9930 at 6/3/20, 1:20 PM:
-

[~epayne] [~cane] [~snemeth] [~sunilg] could you guys share your opinion about 
the POC?

Note that it actually does NOT interfere with the existing maxApps settings 
because those are checked when the application is submitted. So the rejection 
occurs immediately (see {{LeafQueue.validateSubmitApplication()}}). The 
maxParallelApps check comes later, when we submit the application attempt to 
the leaf queue.

Also, to avoid confusion I decided to call the new setting "maxParallelApps" to 
avoid confusion (it's called "maxRunningApps" in FS).


was (Author: pbacsko):
[~epayne] [~cane] [~snemeth] [~sunilg] could you guys share your opinion about 
the POC?

Note that it actually does NOT interfere with the existing maxApps settings 
because those is checked when the application is submitted. So the rejection 
occurs immediately (see {{LeafQueue.validateSubmitApplication()}}). The 
maxParallelApps check comes later, when we submit the application attempt to 
the leaf queue.

Also, to avoid confusion I decided to call the new setting "maxParallelApps" to 
avoid confusion (it's called "maxRunningApps" in FS).

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124947#comment-17124947
 ] 

Peter Bacsko edited comment on YARN-9930 at 6/3/20, 1:20 PM:
-

[~epayne] [~cane] [~snemeth] [~sunilg] could you guys share your opinion about 
the POC?

Note that it actually does NOT interfere with the existing maxApps settings 
because those are checked when the application is submitted. So the rejection 
occurs immediately (see {{LeafQueue.validateSubmitApplication()}}). The 
maxParallelApps check comes later, when we submit the application attempt to 
the leaf queue.

Also, to avoid confusion I decided to call the new setting "maxParallelApps" 
(it's called "maxRunningApps" in FS).


was (Author: pbacsko):
[~epayne] [~cane] [~snemeth] [~sunilg] could you guys share your opinion about 
the POC?

Note that it actually does NOT interfere with the existing maxApps settings 
because those are checked when the application is submitted. So the rejection 
occurs immediately (see {{LeafQueue.validateSubmitApplication()}}). The 
maxParallelApps check comes later, when we submit the application attempt to 
the leaf queue.

Also, to avoid confusion I decided to call the new setting "maxParallelApps" to 
avoid confusion (it's called "maxRunningApps" in FS).

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124947#comment-17124947
 ] 

Peter Bacsko commented on YARN-9930:


[~epayne] [~cane] [~snemeth] [~sunilg] could you guys share your opinion about 
the POC?

Note that it actually does NOT interfere with the existing maxApps settings 
because those is checked when the application is submitted. So the rejection 
occurs immediately (see {{LeafQueue.validateSubmitApplication()}}). The 
maxParallelApps check comes later, when we submit the application attempt to 
the leaf queue.

Also, to avoid confusion I decided to call the new setting "maxParallelApps" to 
avoid confusion (it's called "maxRunningApps" in FS).

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124932#comment-17124932
 ] 

Peter Bacsko commented on YARN-9930:


Attached POC v4 with a new unit tests which verifies the new functionality. 
Still a POC because there are failing UTs.

TODO
* fix failing UTs (most likely  mocking should be fine tuned)
* fix checkstyle
* more tests for the new feature (eg. user limit exceeded)
* check  visibility of new methods
* add tests for CSMaxRunningAppsEnforcer (likely a copy-paste + edits from 
TestMaxRunningAppsEnforcer)

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124932#comment-17124932
 ] 

Peter Bacsko edited comment on YARN-9930 at 6/3/20, 1:02 PM:
-

Attached POC v4 with a new unit test which verifies the new functionality. 
Still a POC because there are failing UTs.

TODO
* fix failing UTs (most likely  mocking should be fine tuned)
* fix checkstyle
* more tests for the new feature (eg. user limit exceeded)
* check  visibility of new methods
* add tests for CSMaxRunningAppsEnforcer (likely a copy-paste + edits from 
TestMaxRunningAppsEnforcer)


was (Author: pbacsko):
Attached POC v4 with a new unit tests which verifies the new functionality. 
Still a POC because there are failing UTs.

TODO
* fix failing UTs (most likely  mocking should be fine tuned)
* fix checkstyle
* more tests for the new feature (eg. user limit exceeded)
* check  visibility of new methods
* add tests for CSMaxRunningAppsEnforcer (likely a copy-paste + edits from 
TestMaxRunningAppsEnforcer)

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9930) Support max running app logic for CapacityScheduler

2020-06-03 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9930:
---
Attachment: YARN-9930-POC04.patch

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch, YARN-9930-POC02.patch, 
> YARN-9930-POC03.patch, YARN-9930-POC04.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10281) Redundant QueuePath usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule

2020-06-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124911#comment-17124911
 ] 

Hadoop QA commented on YARN-10281:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-10281 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10281 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004714/YARN-10281.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/26107/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Redundant QueuePath usage in UserGroupMappingPlacementRule and 
> AppNameMappingPlacementRule
> --
>
> Key: YARN-10281
> URL: https://issues.apache.org/jira/browse/YARN-10281
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10281.001.patch
>
>
> We use the QueuePath and QueueMapping (or QueueMappingEntity) objects in the 
> aforementioned classes, but these technically store the same kind of 
> information, yet we keep converting between them, let's examine if we can use 
> only the QueueMapping(Entity) instead, since that holds more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10279) Avoid unnecessary QueueMappingEntity creations

2020-06-03 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124908#comment-17124908
 ] 

Bilwa S T commented on YARN-10279:
--

[~adam.antal] you can take it over

> Avoid unnecessary QueueMappingEntity creations
> --
>
> Key: YARN-10279
> URL: https://issues.apache.org/jira/browse/YARN-10279
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Bilwa S T
>Priority: Minor
>
> In CS UserGroupMappingPlacementRule and AppNameMappingPlacementRule classes 
> we create new instances of QueueMappingEntity class. In some cases we simply 
> copy the already received class, so we just duplicate it, which is 
> unnecessary since the class is immutable.
> This is just a minor improvement, probably doesn't have much impact, but 
> still puts some unnecessary load on GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10281) Redundant QueuePath usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule

2020-06-03 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124905#comment-17124905
 ] 

Gergely Pollak commented on YARN-10281:
---

This probably will fail, since the dependency is not merged in yet, but it's 
available for review.

> Redundant QueuePath usage in UserGroupMappingPlacementRule and 
> AppNameMappingPlacementRule
> --
>
> Key: YARN-10281
> URL: https://issues.apache.org/jira/browse/YARN-10281
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10281.001.patch
>
>
> We use the QueuePath and QueueMapping (or QueueMappingEntity) objects in the 
> aforementioned classes, but these technically store the same kind of 
> information, yet we keep converting between them, let's examine if we can use 
> only the QueueMapping(Entity) instead, since that holds more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10281) Redundant QueuePath usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule

2020-06-03 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10281:
--
Attachment: YARN-10281.001.patch

> Redundant QueuePath usage in UserGroupMappingPlacementRule and 
> AppNameMappingPlacementRule
> --
>
> Key: YARN-10281
> URL: https://issues.apache.org/jira/browse/YARN-10281
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10281.001.patch
>
>
> We use the QueuePath and QueueMapping (or QueueMappingEntity) objects in the 
> aforementioned classes, but these technically store the same kind of 
> information, yet we keep converting between them, let's examine if we can use 
> only the QueueMapping(Entity) instead, since that holds more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10303) One yarn rest api example of yarn document is error

2020-06-03 Thread bright.zhou (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124878#comment-17124878
 ] 

bright.zhou commented on YARN-10303:


hi [~adam.antal] , I'm glad you did

> One yarn rest api example of yarn document is error
> ---
>
> Key: YARN-10303
> URL: https://issues.apache.org/jira/browse/YARN-10303
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1, 3.2.1
>Reporter: bright.zhou
>Assignee: Hudáky Márton Gyula
>Priority: Minor
>  Labels: documentation, newbie
> Attachments: image-2020-06-02-10-27-35-020.png
>
>
> deSelects value should be resourceRequests
> !image-2020-06-02-10-27-35-020.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10279) Avoid unnecessary QueueMappingEntity creations

2020-06-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124876#comment-17124876
 ] 

Adam Antal commented on YARN-10279:
---

Hi [~BilwaST], Do you plan to work on this on the near future? 

> Avoid unnecessary QueueMappingEntity creations
> --
>
> Key: YARN-10279
> URL: https://issues.apache.org/jira/browse/YARN-10279
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollak
>Assignee: Bilwa S T
>Priority: Minor
>
> In CS UserGroupMappingPlacementRule and AppNameMappingPlacementRule classes 
> we create new instances of QueueMappingEntity class. In some cases we simply 
> copy the already received class, so we just duplicate it, which is 
> unnecessary since the class is immutable.
> This is just a minor improvement, probably doesn't have much impact, but 
> still puts some unnecessary load on GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10303) One yarn rest api example of yarn document is error

2020-06-03 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124870#comment-17124870
 ] 

Adam Antal commented on YARN-10303:
---

Thanks for raising this [~zgw]! I see that it is currently unassigned, hope you 
don't mind if [~mhudaky] works on this.

> One yarn rest api example of yarn document is error
> ---
>
> Key: YARN-10303
> URL: https://issues.apache.org/jira/browse/YARN-10303
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1, 3.2.1
>Reporter: bright.zhou
>Assignee: Hudáky Márton Gyula
>Priority: Minor
>  Labels: documentation, newbie
> Attachments: image-2020-06-02-10-27-35-020.png
>
>
> deSelects value should be resourceRequests
> !image-2020-06-02-10-27-35-020.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10303) One yarn rest api example of yarn document is error

2020-06-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10303:
--
Labels: documentation newbie  (was: )

> One yarn rest api example of yarn document is error
> ---
>
> Key: YARN-10303
> URL: https://issues.apache.org/jira/browse/YARN-10303
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1, 3.2.1
>Reporter: bright.zhou
>Assignee: Hudáky Márton Gyula
>Priority: Minor
>  Labels: documentation, newbie
> Attachments: image-2020-06-02-10-27-35-020.png
>
>
> deSelects value should be resourceRequests
> !image-2020-06-02-10-27-35-020.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10303) One yarn rest api example of yarn document is error

2020-06-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-10303:
-

Assignee: Hudáky Márton Gyula

> One yarn rest api example of yarn document is error
> ---
>
> Key: YARN-10303
> URL: https://issues.apache.org/jira/browse/YARN-10303
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1, 3.2.1
>Reporter: bright.zhou
>Assignee: Hudáky Márton Gyula
>Priority: Minor
> Attachments: image-2020-06-02-10-27-35-020.png
>
>
> deSelects value should be resourceRequests
> !image-2020-06-02-10-27-35-020.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6857) Support REST for Node Attributes configurations

2020-06-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124861#comment-17124861
 ] 

Prabhu Joseph commented on YARN-6857:
-

Thanks [~Naganarasimha].

> Support REST for Node Attributes configurations
> ---
>
> Key: YARN-6857
> URL: https://issues.apache.org/jira/browse/YARN-6857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-6857-YARN-3409.001.patch
>
>
> This jira focusses on supporting mapping of Nodes to  Attributes through REST



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6857) Support REST for Node Attributes configurations

2020-06-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned YARN-6857:
---

Assignee: Bilwa S T  (was: Naganarasimha G R)

> Support REST for Node Attributes configurations
> ---
>
> Key: YARN-6857
> URL: https://issues.apache.org/jira/browse/YARN-6857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-6857-YARN-3409.001.patch
>
>
> This jira focusses on supporting mapping of Nodes to  Attributes through REST



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-06-03 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10274:
--
Attachment: YARN-10274.003.patch

> Merge QueueMapping and QueueMappingEntity
> -
>
> Key: YARN-10274
> URL: https://issues.apache.org/jira/browse/YARN-10274
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10274.001.patch, YARN-10274.002.patch, 
> YARN-10274.003.patch
>
>
> The role, usage and internal behaviour of these classes are almost identical, 
> but it makes no sense to keep both of them. One is used by UserGroup 
> placement rule definitions the other is used by Application placement rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6857) Support REST for Node Attributes configurations

2020-06-03 Thread Naganarasimha G R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124842#comment-17124842
 ] 

Naganarasimha G R commented on YARN-6857:
-

[~prabhujoseph], Glad that you and [~bilwa_st]  can take it forward, please go 
ahead and assign it to yourself!

> Support REST for Node Attributes configurations
> ---
>
> Key: YARN-6857
> URL: https://issues.apache.org/jira/browse/YARN-6857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-6857-YARN-3409.001.patch
>
>
> This jira focusses on supporting mapping of Nodes to  Attributes through REST



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10306) Create simple copy log aggregation file controller

2020-06-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10306:
--
Summary: Create simple copy log aggregation file controller  (was: Create 
copy log aggregation file controller)

> Create simple copy log aggregation file controller
> --
>
> Key: YARN-10306
> URL: https://issues.apache.org/jira/browse/YARN-10306
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Log aggregation file controllers were created (YARN-6875) to effectively wrap 
> and move the logs of containers to a remote filesystem. While this filesystem 
> was HDFS, it was logical to create as big files as we can by packing it into 
> an aggregated (blob) file. As S3A is a valid target since YARN-9525, it is 
> much less painful from the end user point of view to browse these files as 
> they are - and not in an aggregated blob format.
> I propose to implement a "dumb"/bare copy file controller that copies the 
> container log files to the remote file system without any 
> aggregation/wrapping. The only thing which makes sense to enable is the 
> compression of those files, so we should support that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10306) Create copy log aggregation file controller

2020-06-03 Thread Adam Antal (Jira)
Adam Antal created YARN-10306:
-

 Summary: Create copy log aggregation file controller
 Key: YARN-10306
 URL: https://issues.apache.org/jira/browse/YARN-10306
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Affects Versions: 3.3.0
Reporter: Adam Antal
Assignee: Adam Antal


Log aggregation file controllers were created (YARN-6875) to effectively wrap 
and move the logs of containers to a remote filesystem. While this filesystem 
was HDFS, it was logical to create as big files as we can by packing it into an 
aggregated (blob) file. As S3A is a valid target since YARN-9525, it is much 
less painful from the end user point of view to browse these files as they are 
- and not in an aggregated blob format.

I propose to implement a "dumb"/bare copy file controller that copies the 
container log files to the remote file system without any aggregation/wrapping. 
The only thing which makes sense to enable is the compression of those files, 
so we should support that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9971) YARN Native Service HttpProbe logs THIS_HOST in error messages

2020-06-03 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124817#comment-17124817
 ] 

Bilwa S T commented on YARN-9971:
-

[~prabhujoseph] can i assign this jira as there is no update on this?

> YARN Native Service HttpProbe logs THIS_HOST in error messages
> --
>
> Key: YARN-9971
> URL: https://issues.apache.org/jira/browse/YARN-9971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Tarun Parimi
>Priority: Minor
>
> YARN Native Service HttpProbe logs THIS_HOST in error messages. While 
> logging, missed to use the replaced url string.
> {code:java}
> 2019-11-12 19:25:47,317 [pool-7-thread-1] INFO  probe.HttpProbe - Probe 
> http://${THIS_HOST}:18010/master-status failed for IP 172.27.75.198: 
> java.net.ConnectException: Connection refused (Connection refused)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10031) Create a general purpose log request with additional query parameters

2020-06-03 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-10031:
-

Assignee: Andras Gyori  (was: Adam Antal)

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

2020-06-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124807#comment-17124807
 ] 

Prabhu Joseph commented on YARN-10293:
--

Thanks [~Tao Yang] for the comments. If you are fine, will commit this patch. 

bq. And Prabhu Joseph if you have time/bandwidth, can you take a look into 
reservation related logic + preemption + unreserve + global scheduling and see 
what we can optimize here?

Yes sure, YARN-9598 addresses many other issues. Will check how to contribute 
to the same and address any other optimization required.

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement (YARN-10259)
> 
>
> Key: YARN-10293
> URL: https://issues.apache.org/jira/browse/YARN-10293
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10293-001.patch, YARN-10293-002.patch, 
> YARN-10293-003-WIP.patch
>
>
> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues 
> related to it 
> https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
> Have found one more bug in the CapacityScheduler.java code which causes the 
> same issue with slight difference in the repro.
> *Repro:*
> *Nodes :   Available : Used*
> Node1 -  8GB, 8vcores -  8GB. 8cores
> Node2 -  8GB, 8vcores - 8GB. 8cores
> Node3 -  8GB, 8vcores - 8GB. 8cores
> Queues -> A and B both 50% capacity, 100% max capacity
> MultiNode enabled + Preemption enabled
> 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
> 2. JobB Submitted to B queue with AM size of 1GB
> {code}
> 2020-05-21 12:12:27,313 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  
> IP=172.27.160.139   OPERATION=Submit Application Request
> TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005  
>   CALLERCONTEXT=CLI   QUEUENAME=dummy
> {code}
> 3. Preemption happens and used capacity is lesser than 1.0f
> {code}
> 2020-05-21 12:12:48,222 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
>  Non-AM container preempted, current 
> appAttemptId=appattempt_1590046667304_0004_01, 
> containerId=container_e09_1590046667304_0004_01_24, 
> resource=
> {code}
> 4. JobB gets a Reserved Container as part of 
> CapacityScheduler#allocateOrReserveNewContainer
> {code}
> 2020-05-21 12:12:48,226 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e09_1590046667304_0005_01_01 Container Transitioned from NEW to 
> RESERVED
> 2020-05-21 12:12:48,226 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
> tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
> available= used= with 
> resource=
> {code}
> *Why RegularContainerAllocator reserved the container when the used capacity 
> is <= 1.0f ?*
> {code}
> The reason is even though the container is preempted - nodemanager has to 
> stop the container and heartbeat and update the available and unallocated 
> resources to ResourceManager.
> {code}
> 5. Now, no new allocation happens and reserved container stays at reserved.
> After reservation the used capacity becomes 1.0f, below will be in a loop and 
> no new allocate or reserve happens. The reserved container cannot be 
> allocated as reserved node does not have space. node2 has space for 1GB, 
> 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting 
> called causing the Hang.
> *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> 
> CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container 
> on node*
> {code}
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1590046667304_0005 
> on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignContainers: partition= #applications=1
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
> 

[jira] [Commented] (YARN-9903) Support reservations continue looking for Node Labels

2020-06-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124801#comment-17124801
 ] 

Peter Bacsko commented on YARN-9903:


[~Jim_Brennan] thanks for the patch. From your comment, I assume that you've 
been using this change for a while and it's safe. Is that correct?

> Support reservations continue looking for Node Labels
> -
>
> Key: YARN-9903
> URL: https://issues.apache.org/jira/browse/YARN-9903
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tarun Parimi
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-9903.001.patch, YARN-9903.002.patch
>
>
> YARN-1769 brought in reservations continue looking feature which improves the 
> several resource reservation scenarios. However, it is not handled currently 
> when nodes have a label assigned to them. This is useful and in many cases 
> necessary even for Node Labels. So we should look to support this for node 
> labels also.
> For example, in AbstractCSQueue.java, we have the below TODO.
> {code:java}
> // TODO, now only consider reservation cases when the node has no label 
> if (this.reservationsContinueLooking && nodePartition.equals( 
> RMNodeLabelsManager.NO_LABEL) && Resources.greaterThan( resourceCalculator, 
> clusterResource, resourceCouldBeUnreserved, Resources.none())) {
> {code}
> cc [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6857) Support REST for Node Attributes configurations

2020-06-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124798#comment-17124798
 ] 

Prabhu Joseph commented on YARN-6857:
-

[~Naganarasimha] Would like to have these changes in Hadoop, please let us know 
if [~BilwaST] and me can do the remaining work on this if you are fine. Thanks.

> Support REST for Node Attributes configurations
> ---
>
> Key: YARN-6857
> URL: https://issues.apache.org/jira/browse/YARN-6857
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-6857-YARN-3409.001.patch
>
>
> This jira focusses on supporting mapping of Nodes to  Attributes through REST



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10181) Managing Centralized Node Attribute via RMWebServices.

2020-06-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124776#comment-17124776
 ] 

Prabhu Joseph commented on YARN-10181:
--

Yes after notifying the JIRA owner.

> Managing Centralized Node Attribute via RMWebServices.
> --
>
> Key: YARN-10181
> URL: https://issues.apache.org/jira/browse/YARN-10181
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodeattibute
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Currently Centralized NodeAttributes can be managed only through Yarn 
> NodeAttribute CLI. This is to support via RMWebServices.
> {code}
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeAttributes.html#Centralised_Node_Attributes_mapping.
> Centralised : Node to attributes mapping can be done through RM exposed CLI 
> or RPC (REST is yet to be supported).
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10181) Managing Centralized Node Attribute via RMWebServices.

2020-06-03 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124773#comment-17124773
 ] 

Bilwa S T commented on YARN-10181:
--

[~prabhujoseph] ok will do it today. can i reassign that jira?

> Managing Centralized Node Attribute via RMWebServices.
> --
>
> Key: YARN-10181
> URL: https://issues.apache.org/jira/browse/YARN-10181
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodeattibute
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Currently Centralized NodeAttributes can be managed only through Yarn 
> NodeAttribute CLI. This is to support via RMWebServices.
> {code}
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeAttributes.html#Centralised_Node_Attributes_mapping.
> Centralised : Node to attributes mapping can be done through RM exposed CLI 
> or RPC (REST is yet to be supported).
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10181) Managing Centralized Node Attribute via RMWebServices.

2020-06-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved YARN-10181.
--
Resolution: Duplicate

> Managing Centralized Node Attribute via RMWebServices.
> --
>
> Key: YARN-10181
> URL: https://issues.apache.org/jira/browse/YARN-10181
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodeattibute
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Currently Centralized NodeAttributes can be managed only through Yarn 
> NodeAttribute CLI. This is to support via RMWebServices.
> {code}
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeAttributes.html#Centralised_Node_Attributes_mapping.
> Centralised : Node to attributes mapping can be done through RM exposed CLI 
> or RPC (REST is yet to be supported).
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10181) Managing Centralized Node Attribute via RMWebServices.

2020-06-03 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124741#comment-17124741
 ] 

Prabhu Joseph commented on YARN-10181:
--

[~BilwaST] Yes, we will close this Jira as duplicate. Can you work on rebasing 
YARN-6857 the patch. I will review and test the patch.

> Managing Centralized Node Attribute via RMWebServices.
> --
>
> Key: YARN-10181
> URL: https://issues.apache.org/jira/browse/YARN-10181
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodeattibute
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Bilwa S T
>Priority: Major
>
> Currently Centralized NodeAttributes can be managed only through Yarn 
> NodeAttribute CLI. This is to support via RMWebServices.
> {code}
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeAttributes.html#Centralised_Node_Attributes_mapping.
> Centralised : Node to attributes mapping can be done through RM exposed CLI 
> or RPC (REST is yet to be supported).
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org