[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555962#comment-16555962
 ] 

Hudson commented on YARN-4606:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14641 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14641/])
YARN-4606. CapacityScheduler: applications could get starved because (ericp: 
rev 9485c9aee6e9bb935c3e6ae4da81d70b621781de)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/UsersManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-23 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553038#comment-16553038
 ] 

Eric Payne commented on YARN-4606:
--

+1
Thanks for all of your work on this JIRA, [~maniraj...@gmail.com]. I will be 
committing this later today or tomorrow.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-23 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552698#comment-16552698
 ] 

Manikandan R commented on YARN-4606:


Unit test failure is not related to this patch

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-20 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551179#comment-16551179
 ] 

genericqa commented on YARN-4606:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-4606 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932447/YARN-4606.007.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c122c2c6bdac 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5c19ee3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21320/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21320/testReport/ |
| Max. process+thread count | 921 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-20 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551016#comment-16551016
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Attaching .007 patch containing #2, #3 & #4. Please check.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.007.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549772#comment-16549772
 ] 

Eric Payne commented on YARN-4606:
--

bq. Will take care of #2, #3 & #4.
[~maniraj...@gmail.com]. I think that once you make these changes the unit 
tests will be fine as they are.

I tried to create a unit test that would starve the queue without this patch, 
but the only way I could come up with was to start a MiniYarnCluster. I think 
that extra overhead will not give us enough extra code coverage to justify the 
complexity.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-18 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548093#comment-16548093
 ] 

Manikandan R commented on YARN-4606:


Thanks [~eepayne] for your comments.

 
{quote} I have a general concern that these tests are not testing the fix to 
the starvation problem outlined in the description of this JIRA. I'm trying to 
determine if there is a clean way to unit test that use case.
{quote}
Ok. 
 Since Active app starvation happens because of less resource allocation based 
on incorrect active users count, in addition to checking active users count, 
Can we check allocated resources for each user? Is it good enough? Earlier, 
resource allocation (amount of memory, vcores) should be lesser (half of the 
allocation with this patch based on the example given in jira description). 
Whereas, with this patch, it should be higher. 
 (or) 
 With this patch, app should complete faster than before because of proper 
resource allocation as expected. Can we simulate this in test cases and check 
the app completion time?

Will take care of #2, #3 & #4.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-17 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547161#comment-16547161
 ] 

Eric Payne commented on YARN-4606:
--

Thank you, [~maniraj...@gmail.com], for the latest patch.

The code changes look good. However, I have a couple of points with the tests.

- I have a general concern that these tests are not testing the fix to the 
starvation problem outlined in the description of this JIRA. I'm trying to 
determine if there is a clean way to unit test that use case.
- {TestCapacityScheduler#testMoveAppWithActiveUsersWithOnlyPendingApps}}: I am 
concerned about new tests that take longer than necessary because the unit 
tests keep taking longer and longer to run. I think that the following things 
can be done to reduce this test time (in my build environment) from 1min 17sec 
to 24 sec.
-- In the following code, the sleep(5000) outside of the for loop is not 
necessary.
-- In the following code, the sleep(5000) inside of the for loop could be cut 
down to sleep(500).
{code:title=TestCapacityScheduler#testMoveAppWithActiveUsersWithOnlyPendingApps}
Thread.sleep(5000);

//Triggering this event so that user limit computation can
//happen again
for (int i = 0; i < 10; i++) {
  cs.handle(new NodeUpdateSchedulerEvent(rmNode1));
  Thread.sleep(5000);
   }
{code}

- {{TestCapacityScheduler#testMoveAppWithActiveUsersWithOnlyPendingApps1}}: I 
don't think this test is necessary. It takes more than 1:20 to run in my build 
environment, and as far as I can tell, it is verifying that the active users 
count is not ever updated after a move if node heartbeats are not received. 
However, in a running YARN installation, node heartbeats are received every 
second (by default). Unless I'm missing something, this isn't a use case that 
one would encounter in a running Hadoop system.
- {{TestCapacityScheduler#setupQueueConfigurationForActiveUsersChecks}}: The 
parameters to {{conf.setUserLimitFactor(...)}} don't need to be 100.0f. User 
limit factor can be thought of as the multiplier for the amount of a queue that 
one user can consume. So, if the user limit factor is 1.0f, one user can use 
the capacity of the queue. If it is 2.0f, one user can use twice the capacity 
of the queue, and so forth. Since these queues have a capacity of 50%, I would 
set this to 2.0f.


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539156#comment-16539156
 ] 

genericqa commented on YARN-4606:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
42s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4606 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931034/YARN-4606.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0c5bd396 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d503f65 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21204/testReport/ |
| Max. process+thread count | 912 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21204/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> CapacityScheduler: applications could get 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-10 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538949#comment-16538949
 ] 

Manikandan R commented on YARN-4606:


Fixed whitespace related issues.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538156#comment-16538156
 ] 

genericqa commented on YARN-4606:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 23 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 
35s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4606 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12930945/YARN-4606.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c6a11600 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9bd5bef |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21198/artifact/out/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21198/artifact/out/whitespace-tabs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21198/testReport/ |
| Max. process+thread count | 926 (vs. ulimit of 1) |
| modules | 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-09 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538043#comment-16538043
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thanks for the explanation.

I've attached .005 patch containing changes and its related test cases for your 
review. Test cases covers the above discussed use cases as well, can be removed 
before committing the changes.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-09 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537497#comment-16537497
 ] 

Eric Payne commented on YARN-4606:
--

{quote}
So, after move app operation and if there is no events (which can trigger user 
limit computation) for brief amount of time, am seeing incorrect 
numActiveUsersWithOnlyPendingApps count. Is this acceptable? or Should we 
trigger user limit computation after move operation like how we are doing it in 
other places?
{quote}
[~maniraj...@gmail.com], moving the apps to the other queue does trigger a 
recalculation of user limits since it is adding a new app to the queue and 
potentially a new user. Also, since it fixes itself after the nodemanager 
heartbeat of a few seconds (default is 1), I think that is fine.

I noticed that there is another bug related to moving apps to a different queue 
that could affect the above use case. See YARN-8421

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-28 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526328#comment-16526328
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thank you for great explanation. I am able to understand the flow 
better now.

I revisited "move apps" problem which i raised earlier based on new patch and 
don't think it requires any changes as variables required to calculate 
numActiveUsersWithOnlyPendingApps are already being set through 
submitApplication, finishApplication etc calls. However, I am seeing an minor 
update issue as described below:

Lets say, We want to move all apps from queue, A1 to queue, B1. A1 has 4 apps 
(Only 2 were accommodated because of max am limit constraint. So, remaining 2 
not yet activated). All these 4 apps are triggered by different users from u1 
to u4. For example app1 by u1 and so on. Only for app 1 & app2, there is an 
allocate request in pipeline. At this point, {{numActiveUsers}} is 4 and 
{{numActiveUsersWithOnlyPendingApps}} is 2 in Queue, A1. Now move has been 
triggered. Since there were running containers for both app 1 and app 2, app3 
and app4 has been activated before app 1 and app 2 in Queue, B1 as both these 
apps were busy in detaching and attaching containers. After the move operation 
and thread sleep of 5s, pulled these counts expecting u1 and u2 as 
ActiveUsersWithOnlyPendingApps, but couldn't able to see it. {{numActiveUsers}} 
is 2 as u3 and u4 had become active users and 
{{numActiveUsersWithOnlyPendingApps}} is 0 in Queue B1. Then, introduced an 
NodeUpdate event after the move operation just to force the user limit 
computation to see the impact on these counts. Now, can able to 
ActiveUsersWithOnlyPendingApps as 2 and ActiveUsers as 0 (as both u3 and u4 had 
become non active users by this time as there are no pending allocate request).

So, after move app operation and if there is no events (which can trigger user 
limit computation) for brief amount of time, am seeing incorrect 
{{numActiveUsersWithOnlyPendingApps}} count. Is this acceptable? or Should we 
trigger user limit computation after move operation like how we are doing it in 
other places? Please share your thoughts and correct my understanding if you 
see a gap

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-25 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522811#comment-16522811
 ] 

Eric Payne commented on YARN-4606:
--

{quote}At the same time, this patch is less "strict" in terms of updates 
(specifically on when? ) compared to approaches discussed in our earlier 
patches.
{quote}
The value for number of active apps per user used to be calculated every time 
through the scheduler loop, which was a performance problem. In order to avoid 
this heavy calculation, YARN-5889 created the {{UsersManager}}. Instead of 
doing the calculation every time through the loop, YARN-5889 only recalculates 
these values when events occurs that could affect this count like new 
application, app completes, new container request, completed container, etc. In 
the latest POC patch, {{activeUsersWithOnlyPendingApps}} is part of this flow, 
so it will always be updated whenever anything happens that could affect this 
value.
{quote}Also, based on our earlier discussions, We need to depend on 
activeUsers.get() only in certain context and sum of activeUsers.get() and 
activeUsersWithOnlyPendingApps.get() in some other places. But POC patch always 
depends on later value. I didn't understand this part.
{quote}
I think you are referencing this comment from above:
{quote}My understanding is that user limit would use activeUsers and things 
like max AM limit per user, we'd use activeUsers + activeUsersOfPendingApps
{quote}
{{LeafQueue#activateApplications}} is the only thing that calls 
{{UsersManager#getNumActiveUsers}}, which it uses to calculate the 
user-specific AM limit, so it's the one that needs both activeusers + 
{{activeUsersWithOnlyPendingApps}}.
 {{UsersManager#computeUserLimit}} uses only activeUsers to calculate the 
headroom and user limit, which is what we decided in the comment above. Is that 
your understanding of these comments?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-25 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522599#comment-16522599
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thanks for the patch.

At a high level, POC is very simple from implementation perspective and changes 
would be minimal with this approach. At the same time, this patch is less 
"strict" in terms of updates (specifically on when? ) compared to approaches 
discussed in our earlier patches. For example, In earlier approach, 
numActiveUsersWithOnlyPendingApps would be incremented as soon as app gets 
activated and gets decremented as soon as AM container gets allocated. In 
addition, all of these things happens immediately and only after the dependent 
steps gets completed for sure. Whereas, new POC patch depends on the values 
(pendingApplications, activeApplications etc of User object), conditions before 
the actual work (for example, assuming AM container would be allocated 
successfully based on checks in LeafQueue#activateApplications) and updates 
numActiveUsersWithOnlyPendingApps as part of regular computeUserLimits flow. 
All these things is creating a slight discomfort and lead to some of the 
questions like

What is the time frame that we are seeing between accepting the app and 
updating numActiveUsersWithOnlyPendingApps? Is this time frame acceptable? 
Aren't we running little slower in doing updates? Is there any chance by which 
AM container has been failed to allocate? Lets say, If AM container allocation 
goes through successfully, Would be there any delay in allocating AM 
containers? During this delayed duration, we are considering the user as active 
user rather than treating the user as "activeUsersWithOnlyPendingApps". Is this 
acceptable? I am interested in understanding your thoughts behind this tradeoff.

Also, based on our earlier discussions, We need to depend on 
{{activeUsers.get()}} only in certain context and sum of {{activeUsers.get()}} 
and {{activeUsersWithOnlyPendingApps.get()}} in some other places. But POC 
patch always depends on later value. I didn't understand this part.

On the other hand, We can avoid {{AppAMAttemptsFailedSchedulerEvent}} related 
changes completely with this new patch as anyway {{User.finishApplication()}} 
would be called for sure even when max AM attempts has been reached.

Please share your thoughts.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520822#comment-16520822
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], we can fix the queue application starvation problem by 
making most of the changes in the scheduler-specific users managers. For 
{{CapacityScheduler}}, all the changes can be done in the {{UsersManager}} 
class. For the other schedulers (FIfo, Fair, etc.), I think there needs to be 
some amount of changes in the scheduler infrastructure classes to support 
retrieving iformation such as number of pending and active apps per user, 
amount of queue's AM limit resources, amount of a user's used AM resources, 
etc. But I think that most of the changes can be done in {{ActiveUsersManager}} 
for other schedulers as well.

I am attaching a POC patch that only modifies {{UsersManager}}. The 
{{UsersManager}} already keeps track of all users in the queue. Each user 
object keeps the number of active apps and the number of pending apps. here is 
the sequence of events plus proposed change:
 - When an application is submitted, the user object's pending apps count is 
incremented
 - If limits are not exceeded, {{LeafQueue}} activates the app
 -- {{Leafqueue#activateApplications}} already checks whether or not activation 
of an application will go over the queue's AM limit.
 -- If activating the application will not go over the queue's AM limit, 
{{Leafqueue#activateApplications}} will increment the user object's active app 
count and decrement the pending app count.
 -- However, if activating the application will go over the queue's AM limit, 
the user's pending app count remains the same.
 - The change made in {{YARN-4606.POC.3.patch}} is that 
{{UsersManager#activateApplication}} will check whether or not the user object 
has any active apps. If not, it will not continue (thus not putting the user in 
the {{activeUsers}} list).

I have not yet analyzed the problem you pointed out above regarding moving apps 
to different queues.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-21 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519562#comment-16519562
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], I am fine with using this JIRA to fix the 
{{CapacityScheduler}} and then using follow-on JIRAs to fix the other 
schedulers. However, I'm not comfortable putting {{CapacityScheduler}}-specific 
code in {{AppSchedulingInfo}}. I'm hoping that most of this code can be pushed 
down into the {{ActiveUsersManager}} (for {{FairScheduler}}) and 
{{UsersManager}} (for {{CapacityScheduler}}) code.

I am investigating this now and should know if this is possible by early next 
week.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-19 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517812#comment-16517812
 ] 

Manikandan R commented on YARN-4606:


Ok, [~eepayne]. I thought of doing it once we settled down few more open issues 
and after thorough check on junits. anyways, not a problem. Can you share your 
thoughts on move app flow?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517635#comment-16517635
 ] 

genericqa commented on YARN-4606:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 51 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
31s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}132m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Switch statement found in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(SchedulerEvent)
 where one case falls through to the next case  At CapacityScheduler.java:where 
one case falls through to the next case  At CapacityScheduler.java:[lines 
1832-1838] |
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAppSchedulingInfo |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-4606 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517557#comment-16517557
 ] 

Eric Payne commented on YARN-4606:
--

I put this Jira into PATCH AVAILABLE mode so that it would kick the pre-commit 
build.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-15 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513874#comment-16513874
 ] 

Manikandan R commented on YARN-4606:


Thanks [~eepayne] for your reviews. I was trying to address "move app" flow 
also in addition to your review comments, but stuck with it and took more time 
than expected. Sorry for the delay. 

I stuck with a case,  admin trying to move an app (waiting for am container) 
from Queue A to Queue B. As part of this, control reaches 
{{AppScheduling#move}} through {{CapacityScheduler#moveApplication}}. As a 
first step, we will need to handle activeUsersWithPendingApps count for both 
queues. For example, After submitting the app to queue inside 
{{CapacityScheduler#moveApplication}}, we will need to do something like 

{quote}
//Handle activeUsersWithOnlyPendingApps count appropriately
if (app.isPending()) \{
  this.getQueue(sourceQueueName).getAbstractUsersManager().
  decrNumActiveUsersWithOnlyPendingApps(user);
  this.getQueue(destQueueName).getAbstractUsersManager().
  incrNumActiveUsersWithOnlyPendingApps(user);
} {quote}

Then, Inside, {{AppScheduling#move}}, we will need to follow the logic similar 
to changes in {{AppScheduling#updatePendingResources}} to call 
{{UsersManager#activateApplications}}. Call to 
{{AppScheduling#updatePendingResources}} happens as part of Allocate flow every 
now and then. There is no such periodic calls for Move App. At some point, 
waitingForAMContainer become false for a given app and call to 
{{UsersManager#activateApplications}} happens and user got activated in normal 
app flow. We will need to handle the same even in Move App flow. I was thinking 
of waiting for some duration (possibly based on average am container allocation 
time? ) so that chance of getting container for am likely to happen. I am not 
sure. Attached patch contains this change as well. Please advise. 

Now, coming back to review comments:

1. Yes, it is scheduler specific. [~leftnoteasy] and [~sunilg] Please share 
your views.
2. For the first cut, I was thinking of fixing this JIRA for CS from end to 
end. Once fix has been ensured for CS, can apply similar changes to FS as well 
either with this jira or a different jira. If we are going to address FS 
related changes in different jira, is it ok to carry the risk you mentioned 
earlier? Please advise. Either, I can take help from folks who are familiar 
with FS flow or can hand over to them. Which ever is fine with us.
3. Addressed.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-31 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497232#comment-16497232
 ] 

Eric Payne commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com] for the updated patch. Here are my comments so 
far:
- I am concerned that this implementation adds code that is specific to 
{{CapacityScheduler}} inside of {{AppSchedulingInfo}}. I feel that this sets a 
precedent that makes it hard to maintain a clean separation between abstract 
and specific scheduler code. Also, this only fixes the problem for the 
{{CapacityScheduler}}. The previous fix in patch 001 was relying on metrics and 
I realize that is risky, but it was a more generic fix. I would be interested 
to hear thoughts from [~sunilg] and [~leftnoteasy].
- Only the {{CapacityScheduler}} has been changed to handle the new 
{{AppAMAttemptsFailedSchedulerEvent}}. Should the other schedulers handle that 
as well? If they don't handle it, don't we risk them getting unhandled event 
exceptions?
- In all places where new {{LOG.debug(...)}} statementes are added, please also 
enclose them with {{if (LOG.isDebugEnabled())}}. This is for the sake of 
performance, so that the strings are not built, passed to {{LOG.debug()}}, and 
then thrown away if log debugging is not enabled.


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-31 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496539#comment-16496539
 ] 

Manikandan R commented on YARN-4606:


Attached .003 patch for review.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-30 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496118#comment-16496118
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thanks for reaching out proactively.

Sorry for the delay, was completely offline little more than a week for 
personal work. I resumed activities on this yesterday and facing some issues in 
extracting am limit using 
SchedulerApplicationAttempt#getAppAttemptResourceUsage. I am in touch with 
[~sunilg] on this particular issue. Will upload patch as early as possible.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-30 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495656#comment-16495656
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], do you have a status on updating this patch? Do you 
need any help from the community?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-18 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480628#comment-16480628
 ] 

Manikandan R commented on YARN-4606:


{quote}If we are planning to move calling decrNumActiveUsersOfPendingApps from 
AppSchedulingInfo#updatePendingResources to SchedulerApplicationAttempt, then 
do we still need to am usage check against max am limit? I don't think 
so.{quote}

I was wrong here. We need this check. Missed related changes in patch, will 
incorporate and update patch shortly. Please ignore .002.patch.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-17 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479016#comment-16479016
 ] 

Manikandan R commented on YARN-4606:


Attaching .002 patch for review.

{quote}Does this patch handles the case that one user has multiple pending 
apps? (Since it doesn't store user to apps information).{quote}
Started handling this case.
{quote}Should we call this inside 
{{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}}? 
I think we should remove active user from pending apps once AM container get 
allocated{quote}
Yes, inside {{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} and 
that too after updating containers with tokens as 
{{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}} does takes care of 
INCREASE, DECREASE, PROMOTE, DEMOTE cases etc not the regular cases.
{quote}Instead of using metrics, it might be better to use 
SchedulerApplicationAttempt#getAppAttemptResourceUsage instead.{quote}
Not required, I guess as explained in previous comment.
{quote}I am doing an in-depth review, but I would like to address a few things 
first regarding method names and comments. I feel that it is important to be 
accurate in these areas in order to eliminate confusion for those maintaining 
this code.{quote}
Taken care of all related comments.

In addition to above changes, We have taken care of app being in ACCEPTED state 
with all AM attempts has been failed because of some reasons. We would like to 
decrement the count even in this case and handles this case via signalling 
scheduler using new event type. 

Also, I am assuming app MOVE from one queue to another doesn't require changes 
as it happen only when app is running?

Thanks [~sunilg] for providing suggestions in some of the above steps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-09 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468872#comment-16468872
 ] 

Manikandan R commented on YARN-4606:


{quote}1) Does this patch handles the case that one user has multiple pending 
apps? (Since it doesn't store user to apps information).{quote}
Patch doesn't do anything about this case. As and when user submits an app, CS 
keeps increasing activeUsersOfPendingApps count as part of accepting the 
application irrespective of whether app has been submitted by same or different 
user.

{quote}Should we call this inside 
SchedulerApplicationAttempt#pullNewlyUpdatedContainers? 
I think we should remove active user from pending apps once AM container get 
allocated{quote}

While trying to understand this through a real testing, encountered a situation 
where in {{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}} returns 
empty {{updatedContainers}} always. I was just thinking whether can we call 
{{abstractUsersManager.decrNumActiveUsersOfPendingApps()}} inside 
{{SchedulerApplicationAttempt#pullNewlyAllocatedContainers}} something like

{code}
if(! this.isWaitingForAMContainer() && 
! hasActiveUsersOfPendingAppsDecremented.get()) {
  
this.queue.getAbstractUsersManager().decrNumActiveUsersOfPendingApps();
  hasActiveUsersOfPendingAppsDecremented.set(true);
}
{code}

If we are planning to move calling {{decrNumActiveUsersOfPendingApps}} from 
{{AppSchedulingInfo#updatePendingResources}} to 
{{SchedulerApplicationAttempt}}, then do we still need to am usage check 
against max am limit? I don't think so. We faced the issue of accepting second 
app when we were calling {{decrNumActiveUsersOfPendingApps}} inside 
{{abstractUsersManager.activateApplication()}} and that too from 
{{AppSchedulingInfo#updatePendingResources}}. I dont think it is required 
anymore?

{quote}Does hasActiveUsersOfPendingAppsDecremented need to be atomic? What is 
the benefit?{quote}

Not required, I guess. Was trying to be too defensive :)

Will address names and comments related review points once we conclude the flow.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466541#comment-16466541
 ] 

Eric Payne commented on YARN-4606:
--

{code:title=AppSchedulingInfo#updatePendingResources}
if(! hasActiveUsersOfPendingAppsDecremented.get()) {
abstractUsersManager.decrNumActiveUsersOfPendingApps();
hasActiveUsersOfPendingAppsDecremented.set(true);
}
{code}

Does {{hasActiveUsersOfPendingAppsDecremented}} need to be atomic? What is the 
benefit?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466136#comment-16466136
 ] 

Eric Payne commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com] for your consistent and continuing efforts to 
fix this problem.

I am doing an in-depth review, but I would like to address a few things first 
regarding method names and comments. I feel that it is important to be accurate 
in these areas in order to eliminate confusion for those maintaining this code.

- All occurrences of "atleast" should be "at least"
- Comment for {{AbstractUsersManager#getNumActiveUsers}}:
{code:title=AbstractUsersManager#getNumActiveUsers}
-   * Get number of active users i.e. users with applications which have pending
-   * resource requests.
+   * Get number of active users i.e. users with atleast 1 active applications
{code}
For this comment, I would say "Get number of active users i.e. users with at 
least 1 running application and and applications requesting resources"
- I would prefer it if the name of {{ActiveUsersOfPendingApps}} was changed 
everywhere to {{ActiveUsersWithOnlyPendingApps}}. This is kind of a nit, but I 
do feel that the rename would be more descriptive.
- {{AbstractUsersManager#incrNumActiveUsersOfPendingApps}}, 
{{decrNumActiveUsersOfPendingApps}}, and {{getNumActiveUsersOfPendingApps}}
Change description to "number of users with only pending apps"
- {{UsersManager#activateApplication}} and {{deactivateApplication}}
Change "Active users which has atleast 1 pending apps:" to "Active users which 
have at least 1 pending app:"


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463324#comment-16463324
 ] 

Wangda Tan commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com], 
Some questions: 

1) Does this patch handles the case that one user has multiple pending apps? 
(Since it doesn't store user to apps information).

2) 
{code}
abstractUsersManager.decrNumActiveUsersOfPendingApps(); 
{code}
Should we call this inside 
{{SchedulerApplicationAttempt#pullNewlyUpdatedContainers}}? 
I think we should remove active user from pending apps once AM container get 
allocated.

3)
{code} 
Resources.lessThan(rc, cr,
metrics.getUsedAMResources(), metrics.getMaxAMResources())
{code} 
Instead of using metrics, it might be better to use 
{{SchedulerApplicationAttempt#getAppAttemptResourceUsage}} instead. 

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-01 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459882#comment-16459882
 ] 

Manikandan R commented on YARN-4606:


Attaching .001 patch containing test case for CS and changes required to fix 
the problem (activeUsersOfPendingApps being-ve issue) mentioned in my previous 
comment. Also, tested the patch in my pseudo setup for CS. I am planning to run 
all junits after the review.

{quote}AppSchedulingInfo is supports to cache status for pending resource, it 
might be better to avoid invoking SchedulerAppAttempt's method from 
AppSchedulingInfo.{quote}

[~leftnoteasy] As of now, {{AppSchedulingInfo}} calls only 
{{SchedulerAppAttempt#isWaitingForAMContainer}} to take some decision, for 
which I don't see any cache. Can you please explain?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453297#comment-16453297
 ] 

Wangda Tan commented on YARN-4606:
--

Thanks [~eepayne] / [~maniraj...@gmail.com],

Here's my understanding of the proposed approach: 

1) When we compute {{max-am-resource-per-user}}, we uses #active-users + 
#pending-users.
2) When we compute {{max-user-limit}}, we use #active-users only. 

To me this is correct and (seems) same as what I proposed previously:
{code}
We should only consider a user is "active" if any of its application is active. 
And CS will use the "#active-user-which-has-at-least-one-active-app" to compute 
user-limit.

Computation of max-am-resource-per-user needs to be updated as well. We should 
get a #users-which-has-pending-apps to compute max-am-resource-per-user.
{code}

I haven't checked very much details of the patch since [~maniraj...@gmail.com] 
is working on update the tests, etc. Just one suggestion is: AppSchedulingInfo 
is supports to cache status for pending resource, it might be better to avoid 
invoking SchedulerAppAttempt's method from AppSchedulingInfo.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-24 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449383#comment-16449383
 ] 

Manikandan R commented on YARN-4606:


Thanks [~eepayne] for the patch explaining your suggestions. I took the delta 
changes and started working on the Junits and testing in my pseudo cluster. I 
am encountering a situation where in activeUsersOfPendingApps is -ve at times 
(probably because of couple of calls to UsersManager#activeApplication() as 
part of the flow). May be, we need to take out decrement 
activeUsersOfPendingApps code from UsersManager#activeApplication() and can be 
called separately from clients. Will need to think through in detail and 
provide the patch.

In the meantime, [~leftnoteasy] can also share his thoughts. Thanks.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449075#comment-16449075
 ] 

Wangda Tan commented on YARN-4606:
--

Thanks [~eepayne] / [~maniraj...@gmail.com] for working on the fix. I just 
unassigned myself, please feel free to assign to you if you plan to do that.

I'm going to check the patch / approach in the next two days.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443176#comment-16443176
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], I am attaching {{YARN-4606.POC.2.patch}} that 
demonstrates what I am suggesting. I have teste4d it in my pseudo cluster and 
it will only activate a new AM if the used queue AM resources are less than the 
queue max AM resources, but it also fixes the problem of starvation.

I have only tested this on Capacity Scheduler, not Fair Scheduler.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-10 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432474#comment-16432474
 ] 

Manikandan R commented on YARN-4606:


[~leftnoteasy] [~sunilg] Can you please share your views?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425736#comment-16425736
 ] 

Eric Payne commented on YARN-4606:
--

bq. could you briefly summary what is the current issue and solution being 
discussed?
[~leftnoteasy], the latest patch ({{YARN-4606.POC.patch}}) changed the behavior 
of the capacity scheduler so that it would never give a container to the second 
app for its AM as long as the first app consumed the entire queue and had 
pending requests, even when the AM used is lower than AM max. I described it in 
more detail 
[above|https://issues.apache.org/jira/browse/YARN-4606?focusedCommentId=16391802=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391802].

[I suggested that one 
solution|https://issues.apache.org/jira/browse/YARN-4606?focusedCommentId=16396094=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16396094]
 would be to modify the code as follows as long as there is a way to do it in 
an abstract way:
{code:title=AppSchedulingInfo#updatePendingResources}
if( Not Waiting For AM Container
|| (Queue Used AM Resources < Queue Max AM Resources) {
  abstractUsersManager.activateApplication(user, applicationId);
}
{code}

I suggested a way to do that, but it seems a little cumbersome.

So then I started wondering if there was a way to leverage the {{Schedulable 
Apps}} and {{Non-Schedulable Apps}} user info in the 
{{AppSchedulingInfo#updatePendingResources}} code. I looked more closely, 
however, and it is too early within 
{{AppSchedulingInfo#updatePendingResources}} to tell whether or not a new app 
is destined to be schedulable.

So, I think the best suggestion I have is the pseudo-code I posted above.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-04 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425249#comment-16425249
 ] 

Manikandan R commented on YARN-4606:


{quote}I suspect it may be the first case. Please check to make sure your queue 
configuration is set to allow multiple running users in the queue.
{quote}
Yes, you are right. I can able to run App2 by User 2 after making the config 
changes. On the other hand, With the patch, container is not getting allocated 
to App2's AM while App1 is running.
{quote}The concern is that even though they may be in both FSQueueMetrics, and 
CSQueueMetrics, they are not accessible at the abstract QueueMetrics layer 
because they have different accessors. It should be possible to add a new, 
abstract accessor in QueueMetrics that is implemented in FS/CS QueueMetrics.
{quote}
Yes, something similar can be done in Abstract class level to achieve this if 
required.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423456#comment-16423456
 ] 

Wangda Tan commented on YARN-4606:
--

[~eepayne], thanks for pinging me about this, I'm a bit lost contexts of above 
question, could you briefly summary what is the current issue and solution 
being discussed?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423059#comment-16423059
 ] 

Eric Payne commented on YARN-4606:
--

{code:title=AppSchedulingInfo#updatePendingResources}
if(! this.schedulerApplicationAttempt.isWaitingForAMContainer()) {
  abstractUsersManager.activateApplication(user, applicationId);
}
{code}
Sorry for backtracking, but after thinking about this some more, I think the 
question that needs to be asked by {{AppSchedulingInfo#updatePendingResources}} 
is not "Is this application attempt asking for an AM?", but rather I think the 
question is "Does this user have schedulable apps (including this attempt)?"

I'd like [~sunilg]'s and [~leftnoteasy]'s input on this design suggestion.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-04-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422631#comment-16422631
 ] 

Eric Payne commented on YARN-4606:
--

{quote}AM container is being allocated to App2 only after App1 completion when 
cluster is running full.
{quote}
[~maniraj...@gmail.com], In the current implementation (that is, without the 
patch), there are a couple of things that can affect whether or not App2 will 
be given the freed container:
 - In the running queue, if {{Configured Minimum User Limit Percent}} is set to 
100%, only one user can run in the queue at a time. If this is so, then the 
Capacity Scheduler will only assign new containers to App1 (owned by User1). 
However, if {{Configured Minimum User Limit Percent}} is 50% or less, the 
Capacity Scheduler will assign new containers to App2 (owned by User2) until 
they both have 50% of the queue or one stops asking for new resources.
 - In the running queue, if {{Used Application Master Resources}} equals {{Max 
Application Master Resources}}, the Capacity Scheduler will not assign an AM to 
App2.
 - The same thing happens if {{Num Schedulable Applications}} is equal to {{Max 
Applications}}, but that's probably not the case here.

I suspect it may be the first case. Please check to make sure your queue 
configuration is set to allow multiple running users in the queue.
{quote}{quote}However, I'm not sure of the best way to get the values for a 
queue's Used AM Resources and Max AM Resources from this context. Those may be 
capacity scheduler-specific values.
{quote}
Yes. But I do see some equivalents available in FSQueueMetrics.
{quote}
The concern is that even though they may be in both FSQueueMetrics, and 
CSQueueMetrics, they are not accessible at the abstract {{QueueMetrics}} layer 
because they have different accessors. It should be possible to add a new, 
abstract accessor in {{QueueMetrics}} that is implemented in FS/CS QueueMetrics.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-03-29 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418782#comment-16418782
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] Thanks for your detailed explanation. Sorry for the delay.
{quote}In this scenario, User2 wants to start App2 but User1 is consuming all 
resources in the queue with App1. When App1 releases a resource, however, it is 
not given to App2. The resource is given back to App1, which brings its Pending 
value down to 19. This is incorrect behavior since Queue1 has room for 2 
AMs.{quote}
I was trying to understand this behaviour in current code (without my patch) 
and come to know that AM container is being allocated to App2 only after App1 
completion when cluster is running full.

In my single node pseudo setup, total cluster resources is 8192M, 8 vcores, 
only 1 queue (default) with 100% allocation and max am resources is 2048MB, 2 
vcores as max am resource percent is 0.2. I submitted an app (say App1) through 
DS with num_containers as 20. While App1 is running and its pending containers 
is around 15, submitted second app (say App2) with num_containers as 10. I can 
see AM container for App2 is being allocated only after App1 completion, which 
is not in line with your earlier comments. Am I missing anything here?
{quote}However, I'm not sure of the best way to get the values for a queue's 
Used AM Resources and Max AM Resources from this context. Those may be capacity 
scheduler-specific values.
{quote}
Yes. But I do see some equivalents available in {{FSQueueMetrics}}.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-03-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396094#comment-16396094
 ] 

Eric Payne commented on YARN-4606:
--

bq. resources are not assigned to the second app when they should be
I'm unsure about the appropriate way to fix this. My original thinking was that 
we could do something similar to the following:
{code:title=AppSchedulingInfo#updatePendingResources}
if( Not Waiting For AM Container
|| (Queue Used AM Resources < Queue Max AM Resources) {
  abstractUsersManager.activateApplication(user, applicationId);
}
{code}

However, I'm not sure  of the best way to get the values for a queue's {{Used 
AM Resources}} and {{Max AM Resources}} from this context. Those may be 
capacity scheduler-specific values.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-03-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391802#comment-16391802
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], thank you for the patch. The overall approach looks 
fine, but I have a couple of concerns.
 - The behavior of assigning resources to schedulable applications has changed. 
With this patch, in the following use case, resources are not assigned to the 
second app when they should be. I have not analyzed the behavior closely enough 
to debug the issue, but I wish to document the behavior:
 -- Queue1 total resources: 40
 -- Queue1 Max Application Master Resources: 2
 -- Container sizes are all 1 resource
|*User Name*|*Applicatiton ID*|*Used AM resources*|*Total Used 
Resources*|*Pending Resources*|
|User1|App1|1|39|20|
|User2|App2|0|0|1 (waiting for AM)|

 -- In this scenario, User2 wants to start App2 but User1 is consuming all 
resources in the queue with App1. When App1 releases a resource, however, it is 
not given to App2. The resource is given back to App1, which brings its Pending 
value down to 19. This is incorrect behavior since Queue1 has room for 2 AMs.

 - I think the {{TestRMHA}} unit test needs to be modified to adjust to this 
patch:
{code:java}
TestRMHA
TestRMHA.testFailoverAndTransitions:219->verifyClusterMetrics:754 Incorrect 
value for metric activeApplications expected:<1> but was:<0>
TestRMHA.testFailoverClearsRMContext:550->verifyClusterMetrics:754 Incorrect 
value for metric activeApplications expected:<1> but was:<0>
{code}

 - A couple of minor things:
 -- IIUC, the value stored in {{activeUsersOfPendingApps}} represents the 
number of suers that do not have any active applications. Is that correct? If 
so, I think it would be more clear if it were called 
{{usersWithOnlyPendingApps}}.
 -- In {{AbstractUsersManager}} and {{ActiveUsersManager}}, *atleast* should be 
"at least*.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-03-03 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384645#comment-16384645
 ] 

Manikandan R commented on YARN-4606:


[~eepayne] [~sunilg] Thanks for your inputs. Sorry for the delay.

Attached POC patch to confirm it is in line with our discussions. Please review 
the approach. Will need to make it as robust patch by adding tests etc and also 
have to cover FS, FIFO as well after the feedback.

Approach:

1. Introduce activeUsersOfPendingApps in users manager and increment this count 
as and when apps are accepted.
 2. After activating the application, increment activeUsers and decrement 
activeUsersOfPendingApps in {{UsersManager#activateApplication}} from 
{{AppSchedulingInfo#updatePendingResources}} only when app is no more waiting 
for AM container.
 3. To calculate max AM limit per user, use activeUsers + 
activeUsersOfPendingApps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347329#comment-16347329
 ] 

Eric Payne commented on YARN-4606:
--

My understanding is that user limit would use {{activeUsers}} and things like 
max AM limit per user, we'd use {{activeUsers}} + {{activeUsersOfPendingApps}}

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347236#comment-16347236
 ] 

Manikandan R commented on YARN-4606:


Thanks [~eepayne] [~sunilg] for your comments.

{quote}And when we need to know all active users in cluster (for user-limit 
computation etc) we might need to use 
activeUsers+activeUsersOfPendingApps.\{quote}

Shouldn't we use activeUsersOfPendingApps only for user limit calculation? 
Otherwise, based on the example given in the description, Won't we end up in 
same situation (2+2 == 4 )? Please correct my understanding.

 

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347220#comment-16347220
 ] 

Sunil G commented on YARN-4606:
---

Thanks [~eepayne]. You are correct. 
 - {{activeUsers}}: users that have at least one active app
 - {{activeUsersOfPendingApps}}: users that have only pending apps

This is the correct definition.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347209#comment-16347209
 ] 

Eric Payne commented on YARN-4606:
--

Thanks everyone for the thoughtful analysis.

I am still analyzing in more depth, but I have a couple of thoughts:
{quote}this is a (known) potential issue of fair ordering policy.
{quote}
This can happen for fifo ordering policy as well.
{quote}have {{activeUsersOfPendingApps}} along with {{activeUsers}}. Hence in 
case of scheduling we can depend only on {{activeUse}}
{quote}
We need to be careful with these counts because a user can have both active and 
pending apps. I think the definitions should be:
 - {{activeUsers}}: users that have at least one active app
 - {{activeUsersOfPendingApps}}: users that have only pending apps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346808#comment-16346808
 ] 

Sunil G commented on YARN-4606:
---

[~maniraj...@gmail.com]

You can look into {{SchedulerApplicationAttempt.isWaitingForAMContainer()}} to 
know an app is pending for its AM container. Hence  new method 
{{activeApplication()}} in {{AppSchedulingInfo is not needed.}}

Ideally we have {{ActiveUsersManager}} which has all the active users in that 
cluster (including apps which are pending). I thin we can have 
{{activeUsersOfPendingApps along with}} {{activeUsers}} . Hence in case of 
scheduling we can depend only on activeUsers. And when we need to know all 
active users in cluster (for user-limit computation etc) we might need to use 
activeUsers+activeUsersOfPendingApps.

 

cc/ [~leftnoteasy] [~jlowe] [~eepayne] Could you please help to check this and 
share your thoughts.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-30 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345536#comment-16345536
 ] 

Manikandan R commented on YARN-4606:


[~leftnoteasy] Based on my understanding, as of now 
{{ActiveUsersManager#getNumActiveUsers}} has been used only in 
{{LeafQueue#getUserAMResourceLimitPerPartition}} to compute the userAMLimit. 
Given this, ensuring active users count increments only after app activation is 
sufficient to fix this JIRA? Something like,

 1. Introduce a new method {{activeApplication()}} in {{AppSchedulingInfo}} to 
set new private boolean variable "isActivated" to true.
 2. Call #1 from {{LeafQueue#activateApplications}} after 
{{application.updateAMContainerDiagnostics(AMState.ACTIVATED, null);}} similar 
to changes in earlier POC patch.
 3. Ensure {{abstractUsersManager.activateApplication(user, applicationId);}} 
in {{AppSchedulingInfo#updatePendingResources}} executes only when isActivate 
is true.

Please share your views.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2017-12-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286883#comment-16286883
 ] 

Wangda Tan commented on YARN-4606:
--

[~jlowe], as we discussed offline, this is a (known) potential issue of fair 
ordering policy. Please let me know if you have any thoughts on this.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2016-01-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108267#comment-15108267
 ] 

Naganarasimha G R commented on YARN-4606:
-

Hi [~wangda],
Thanks for sharing the proposal, Just to clarify whether my understanding is 
correct : 
* While calculating user limit for assigning containers to a app we take the 
#activeusers = #users who have apps in activated stage and has outstanding 
requests. 
* And while calculating userAMlimit for activating a app we take #activeusers = 
#unique users who are having apps in the pending order policy?
 

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2016-01-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108290#comment-15108290
 ] 

Wangda Tan commented on YARN-4606:
--

Hi [~Naganarasimha],
bq. While calculating user limit for assigning containers to a app we take the 
#activeusers = #users who have apps in activated stage and has outstanding 
requests.
Yes

bq. And while calculating userAMlimit for activating a app we take #activeusers 
= #unique users who are having apps in the pending order policy?
Yes

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2016-01-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108196#comment-15108196
 ] 

Wangda Tan commented on YARN-4606:
--

Proposed solution: 
We should only consider a user is "active" if any of its application is active. 
And CS will use the "#active-user-which-has-at-least-one-active-app" to compute 
user-limit.

Computation of max-am-resource-per-user needs to be updated as well. We should 
get a #users-which-has-pending-apps to compute max-am-resource-per-user.

This looks like a major behavior change to existing scheduler logic. Thoughts? 
[~vinodkv]/[~jlowe]/[~jianhe].

I'm not sure if FairScheduler needs similar changes as well, if a user in 
FSLeafQueue doesn't have any runnable apps, should we increase #active-users of 
QueueMetrics?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2016-01-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108183#comment-15108183
 ] 

Wangda Tan commented on YARN-4606:
--

Updated description of the JIRA, originally it is found by [~karams] while 
doing fairness ordering policy tests, pasting original test cases here just for 
reference:
{code}
Encountered while studying behaviour fairness with UserLimitPercent and 
UserLimitFactor during following test:
Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 
UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving 
situation where 33 application (190 apps completed out of 761 apps, queue can 
345 containers) are running with total of 45 containers running, and that 12 
extra only one app(the app was having around 18000 tasks) , all other apps were 
having AM running only no other containers were given any apps. After that app 
finished, there were 32 AMs that kept running without any containers for task 
being launched
GridMix was run with following settings:
gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, 
gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, 
gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, 
mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, 
gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, 
gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver
 With Users file containing 4 users for RoundRobinUserResolver
{code}

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)