[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-02-01 Thread VADAGA ANANYO RAO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276904#comment-17276904
 ] 

VADAGA ANANYO RAO commented on YARN-10559:
--

Hi [~epayne], thank you for your earlier response. We wanted to get your 
suggestions on better handling multiple user cases. Currently, we are 
considering a formula to calculate FairShare per app like:
{code:java}
fairSharePerUser = total Queue Capacity / # of users
foreach user:
  if(tq.leafqueue.getUserLimit == 100) 
fairSharePerApp = fairSharePerUser / # of apps of that user;
  else
fairSharePerApp = UL / # of apps of that user;
{code}
So, according to the above formula, Say we have a scenario with 2 users and 
UserLimit = 100%,

 

User1 (UL = 100%, fairSharePerUser = 50%)
 * App1 (fairSharePerApp = 50%)

User2 (UL = 100%, fairSharePerUser = 50%)
 * App2 (fairSharePerApp = 25%)
 * App3 (fairSharePerApp = 25%)

Do you see any shortcomings in this formula or can suggest better ways of 
handling multiple user issue? I would really appreciate it :)
cc: [~sunilg] [~wangda]

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch, 
> YARN-10559.0005.patch, YARN-10559.0006.patch, YARN-10559.0007.patch, 
> YARN-10559.0008.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276902#comment-17276902
 ] 

zhuqi commented on YARN-10532:
--

[~wangda] [~gandras]

Attached a new patch for review.

Thanks.

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, 
> YARN-10532.009.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-02-01 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-10532:
-
Attachment: YARN-10532.009.patch

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, 
> YARN-10532.009.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276811#comment-17276811
 ] 

zhuqi edited comment on YARN-10532 at 2/2/21, 5:40 AM:
---

Thanks a lot for  [~gandras]  patient review, it make sense to me.
 # I used to think, parent should also check  last submit time passed by leaf 
queue,  it seems this is not needed.
 # But i think we should check all applications in leaf queue, not only 
"getNumActiveApplications() == 0".
 # The garbage collector in AutoDeletionForExpiredQueuePolicy will make this 
more clear, i agree. But i think the logic should not in the for cycle:

{code:java}
Set newMarks = new HashSet<>();
for (Map.Entry queueEntry :
scheduler.getCapacitySchedulerQueueManager().getQueues().entrySet()) {
  String queuePath = queueEntry.getKey();
  CSQueue queue = queueEntry.getValue();
  if (queue instanceof AbstractCSQueue &&
  ((AbstractCSQueue) queue).isEligibleForAutoDeletion()) {
if (markedForDeletion.contains(queuePath)) {
  sentForDeletion.add(queuePath);
  markedForDeletion.remove(queuePath);
} else {
  newMarks.add(queuePath);
}
  }
}
markedForDeletion.clear();
markedForDeletion.addAll(newMarks);
{code}

I will update a new patch later today, your suggestions are very valid, i am 
glad to work with you.:)


was (Author: zhuqi):
Thanks a lot for  [~gandras]  patient review, it make sense to me.
 # I used to think, parent should also check  last submit time passed by leaf 
queue,  it seems this is not needed.
 # But i think we should check all applications in leaf queue, not only 
"getNumActiveApplications() == 0".
 # The garbage collector in AutoDeletionForExpiredQueuePolicy will make this 
more clear, i agree.

I will update a new patch later today, your suggestions are very valid, i am 
glad to work with you.:)

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276811#comment-17276811
 ] 

zhuqi commented on YARN-10532:
--

Thanks a lot for  [~gandras]  patient review, it make sense to me.
 # I used to think, parent should also check  last submit time passed by leaf 
queue,  it seems this is not needed.
 # But i think we should check all applications in leaf queue, not only 
"getNumActiveApplications() == 0".
 # The garbage collector in AutoDeletionForExpiredQueuePolicy will make this 
more clear, i agree.

I will update a new patch later today, your suggestions are very valid, i am 
glad to work with you.:)

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10607) User environment is unable to prepend PATH when mapreduce.admin.user.env also sets PATH

2021-02-01 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10607:
---
Description: 
When using the tarball approach to ship relevant Hadoop jars to containers, it 
is helpful to set {{mapreduce.admin.user.env}} to something like 
{{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
binaries are on the PATH. This way you can call {{hadoop}} instead of 
{{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
{{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a user 
would like to override the appended portion of {{PATH}} in their environment, 
they are unable to do so. This is because {{PATH}} ends up getting parsed 
twice. Initially it is set via {{mapreduce.admin.user.env}} to 
{{PATH=./hadoop-tarball:$SYS_PATH}}}. In this case {{SYS_PATH}} is what I'll 
refer to as the normal system path. E.g. {{/usr/local/bin:/usr/bin}}, etc.

After this, the user env parsing happens. For example, let's say the user sets 
their {{PATH}} to {{PATH=.:$PATH}}. We have already parsed {{PATH}} from the 
admin.user.env. Then we go to parse the user environment and find the user also 
specified {{PATH}}. So {{$PATH}} ends up getting getting expanded to 
{{./hadoop-tarball:$SYS_PATH}}, which leads to the user's {{PATH}} being 
{{PATH=.:./hadoop-tarball:$SYS_PATH}}. We then append this to {{PATH}}, which 
has already been set in the environment map via the admin.user.env. So we 
finally end up with 
{{PATH=./hadoop-tarball:$SYS_PATH:.:./hadoop-tarball:$SYS_PATH}}. 

This normally isn't a huge deal, but if you want to ship a version of 
python/perl/etc. that clashes with the one that is already there in 
{{SYS_PATH}}, you will need to refer to it by its full path. Since in the above 
example, {{.}} doesn't appear until after {{$SYS_PATH}}. This is a pain and it 
should be possible to prepend its {{PATH}} to override the system/container 
{{SYS_PATH}}, even when also forcefully prepending to {{PATH}} with you hadoop 
tarball.

  was:
When using the tarball approach to ship relevant Hadoop jars to containers, it 
is helpful to set {{mapreduce.admin.user.env}} to something like 
{{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
binaries are on the PATH. This way you can call {{hadoop}} instead of 
{{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
{{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a user 
would like to override the appended portion of {{PATH}} in their environment, 
they are unable to do so. This is because {{PATH}} ends up getting parsed 
twice. Initially it is set via {{mapreduce.admin.user.env}} to 
{{PATH=./hadoop-tarball:$SYS_PATH}}}. In this case {{SYS_PATH}} is what I'll 
refer to as the normal system path. E.g. {{/usr/local/bin:/usr/bin}}, etc.

After this, the user env parsing happens. For example, let's say the user sets 
their {{PATH}} to {{PATH=.:$PATH}}. We have already parsed {{PATH}} from the 
admin.user.env. Then we go to parse the user environment and find the user also 
specified {{PATH}}. So {{$PATH}} ends up getting getting expanded to 
{{./hadoop-tarball:$SYS_PATH}}, which leads to the user's {{PATH}} being 
{{PATH=.:./hadoop-tarball:$SYS_PATH}}. We then append this to {{PATH}}, which 
has already been set in the environment map via the admin.user.env. So we 
finally end up with 
{{PATH=./hadoop-tarball:$SYS_PATH:.:./hadoop-tarball:$SYS_PATH}}

This normally isn't a huge deal, but if you want to override the version of 
python, perl, etc. that is on the host or in the container, you need to prepend 
it to your {{PATH}} or you will need to refer to it by its full path. This is a 
pain and it should be possible to prepend to {{PATH}} to override the 
system/container {{PATH}} even if you want to prepend a {{PATH}} for you hadoop 
tarball.


> User environment is unable to prepend PATH when mapreduce.admin.user.env also 
> sets PATH
> ---
>
> Key: YARN-10607
> URL: https://issues.apache.org/jira/browse/YARN-10607
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> When using the tarball approach to ship relevant Hadoop jars to containers, 
> it is helpful to set {{mapreduce.admin.user.env}} to something like 
> {{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
> binaries are on the PATH. This way you can call {{hadoop}} instead of 
> {{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
> {{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a 
> user would like to override the appended portion of {{PATH}} in their 
> environment, they are unable to do so. This is 

[jira] [Updated] (YARN-10607) User environment is unable to prepend PATH when mapreduce.admin.user.env also sets PATH

2021-02-01 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10607:
---
Description: 
When using the tarball approach to ship relevant Hadoop jars to containers, it 
is helpful to set {{mapreduce.admin.user.env}} to something like 
{{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
binaries are on the PATH. This way you can call {{hadoop}} instead of 
{{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
{{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a user 
would like to override the appended portion of {{PATH}} in their environment, 
they are unable to do so. This is because {{PATH}} ends up getting parsed 
twice. Initially it is set via {{mapreduce.admin.user.env}} to 
{{PATH=./hadoop-tarball:$SYS_PATH}}}. In this case {{SYS_PATH}} is what I'll 
refer to as the normal system path. E.g. {{/usr/local/bin:/usr/bin}}, etc.

After this, the user env parsing happens. For example, let's say the user sets 
their {{PATH}} to {{PATH=.:$PATH}}. We have already parsed {{PATH}} from the 
admin.user.env. Then we go to parse the user environment and find the user also 
specified {{PATH}}. So {{$PATH}} ends up getting getting expanded to 
{{./hadoop-tarball:$SYS_PATH}}, which leads to the user's {{PATH}} being 
{{PATH=.:./hadoop-tarball:$SYS_PATH}}. We then append this to {{PATH}}, which 
has already been set in the environment map via the admin.user.env. So we 
finally end up with 
{{PATH=./hadoop-tarball:$SYS_PATH:.:./hadoop-tarball:$SYS_PATH}}

This normally isn't a huge deal, but if you want to override the version of 
python, perl, etc. that is on the host or in the container, you need to prepend 
it to your {{PATH}} or you will need to refer to it by its full path. This is a 
pain and it should be possible to prepend to {{PATH}} to override the 
system/container {{PATH}} even if you want to prepend a {{PATH}} for you hadoop 
tarball.

  was:
When using the tarball approach to ship relevant Hadoop jars to containers, it 
is helpful to set {{mapreduce.admin.user.env}} to something like 
{{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
binaries are on the PATH. This way you can call {{hadoop}} instead of 
{{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
{{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a user 
would like to override the appended portion of {{PATH}} in their environment, 
they are unable to do so. This is because {{PATH}} ends up getting parsed 
twice. Initially it is set via {{mapreduce.admin.user.env}} to 
{{PATH=./hadoop-tarball:$SYS_PATH}}}. In this case {{SYS_PATH}} is what I'll 
refer to as the normal system path. E.g. {{/usr/local/bin:/usr/bin}}, etc.

After this, the user env parsing happens. For example, let's say the user sets 
their {{PATH}} to {{PATH=.:$PATH}}. We have already parsed {{PATH}} from the 
admin.user.env. Then we go to parse the user environment and find the user also 
specified {{PATH}}. So {{$PATH}} ends up getting getting expanded to 
{{./hadoop-tarball:$SYS_PATH}}, which leads to the user's {{PATH}} being 
{{PATH=.:./hadoop-tarball:$SYS_PATH}}. We then append this to {{PATH}}, which 
has already been set in the environment map via the admin.user.env. So we 
finally end up with 
{{PATH=./hadoop-tarball:$SYS_PATH:.:./hadoop-tarball:$SYS_PATH}}

This normally isn't a huge deal, but if you want to override the version of 
python, perl, etc. that is on the host or in the container, you need to prepend 
it to your {{PATH}} or you will need to refer to it by its full path. This is a 
pain and it should be possible to prepend to {{PATH}} to override the 
system/container {{PATH}} even if you have admin.user.env set to prepend a 
{{PATH}}


> User environment is unable to prepend PATH when mapreduce.admin.user.env also 
> sets PATH
> ---
>
> Key: YARN-10607
> URL: https://issues.apache.org/jira/browse/YARN-10607
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> When using the tarball approach to ship relevant Hadoop jars to containers, 
> it is helpful to set {{mapreduce.admin.user.env}} to something like 
> {{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
> binaries are on the PATH. This way you can call {{hadoop}} instead of 
> {{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
> {{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a 
> user would like to override the appended portion of {{PATH}} in their 
> environment, they are unable to do so. This is because {{PATH}} ends up 
> getting parsed twice. Initially it is set via 

[jira] [Created] (YARN-10607) User environment is unable to prepend PATH when mapreduce.admin.user.env also sets PATH

2021-02-01 Thread Eric Badger (Jira)
Eric Badger created YARN-10607:
--

 Summary: User environment is unable to prepend PATH when 
mapreduce.admin.user.env also sets PATH
 Key: YARN-10607
 URL: https://issues.apache.org/jira/browse/YARN-10607
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


When using the tarball approach to ship relevant Hadoop jars to containers, it 
is helpful to set {{mapreduce.admin.user.env}} to something like 
{{PATH=./hadoop-tarball:\{\{PATH\}\}}} to make sure that all of the Hadoop 
binaries are on the PATH. This way you can call {{hadoop}} instead of 
{{./hadoop-tarball/hadoop}}. The intention here is to force prepend 
{{./hadoop-tarball}} and then append the set {{PATH}} afterwards. But if a user 
would like to override the appended portion of {{PATH}} in their environment, 
they are unable to do so. This is because {{PATH}} ends up getting parsed 
twice. Initially it is set via {{mapreduce.admin.user.env}} to 
{{PATH=./hadoop-tarball:$SYS_PATH}}}. In this case {{SYS_PATH}} is what I'll 
refer to as the normal system path. E.g. {{/usr/local/bin:/usr/bin}}, etc.

After this, the user env parsing happens. For example, let's say the user sets 
their {{PATH}} to {{PATH=.:$PATH}}. We have already parsed {{PATH}} from the 
admin.user.env. Then we go to parse the user environment and find the user also 
specified {{PATH}}. So {{$PATH}} ends up getting getting expanded to 
{{./hadoop-tarball:$SYS_PATH}}, which leads to the user's {{PATH}} being 
{{PATH=.:./hadoop-tarball:$SYS_PATH}}. We then append this to {{PATH}}, which 
has already been set in the environment map via the admin.user.env. So we 
finally end up with 
{{PATH=./hadoop-tarball:$SYS_PATH:.:./hadoop-tarball:$SYS_PATH}}

This normally isn't a huge deal, but if you want to override the version of 
python, perl, etc. that is on the host or in the container, you need to prepend 
it to your {{PATH}} or you will need to refer to it by its full path. This is a 
pain and it should be possible to prepend to {{PATH}} to override the 
system/container {{PATH}} even if you have admin.user.env set to prepend a 
{{PATH}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276529#comment-17276529
 ] 

Hadoop QA commented on YARN-10605:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
38s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 54s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
57s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
57s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/568/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color}
 | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} 

[jira] [Commented] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276466#comment-17276466
 ] 

Hadoop QA commented on YARN-10604:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
40s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 55s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
14s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
13s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/566/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color}
 | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/566/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 94 unchanged - 0 fixed = 95 total (was 94) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | 

[jira] [Commented] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276465#comment-17276465
 ] 

Hadoop QA commented on YARN-10605:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
41s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
36s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 10s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
14s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
12s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/567/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color}
 | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/567/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | 

[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-02-01 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276450#comment-17276450
 ] 

Andras Gyori commented on YARN-10532:
-

Hi [~zhuqi], thank you for the patch! I have analysed the provided code and I 
have the following suggestions:
h2. AbstractCSQueue

Since the queues have some common conditional, I think we could encapsulate 
these logic in methods inside the queue, which greatly simplifies the overall 
code. I would advise to add:
{code:java}
public boolean isEligibleForAutoDeletion() { return false; }

public boolean isInactiveDynamicQueue() {
 long idleDurationSeconds = Time.monotonicNow() - lastSubmittedTimestamp;
 return isDynamicQueue() && isEligibleForAutoDeletion() &&
 (idleDurationSeconds > this.csContext.getConfiguration().
 getAutoExpiredDeletionTime());
}
{code}
h2. LeafQueue

Since LeafQueues have their own way of deciding whether it is eligible for 
deletion, override isEligibleForAutoDeletion:
{code:java}
@Override
  public boolean isEligibleForAutoDeletion() {
return isDynamicQueue() && getNumActiveApplications() == 0;
  }
{code}
h2. ParentQueue

Same goes for ParentQueues:
{code:java}
@Override
  public boolean isEligibleForAutoDeletion() {
return isDynamicQueue() && getChildQueues().size() == 0;
  }
{code}
Also, the lastSubmittedTimestamp means something else in case of ParentQueues. 
When we add a new child queue, we refresh this timestamp, just as we refreshed 
the LeafQueue's timestamp on application submission:
{code:java}
private CSQueue addDynamicChildQueue(String childQueuePath, boolean isLeaf)
  throws SchedulerDynamicEditException {
...
  CSQueue newQueue = createNewQueue(childQueuePath, isLeaf);
  this.childQueues.add(newQueue);
  signalToSubmitToQueue();
...
}
{code}
h2. CapacityScheduler

The logic in CapacityScheduler#AutoCreatedQueueDeletion (which violates the 
Java naming convention by the way, so it should be renamed) will be simplified 
to:
{code:java}
 private void AutoCreatedQueueDeletion(CSQueue checkQueue)
  throws SchedulerDynamicEditException{
writeLock.lock();
try {
  if (checkQueue instanceof AbstractCSQueue
  && ((AbstractCSQueue) checkQueue).isInactiveDynamicQueue()) {
removeQueue(checkQueue);
  }
} finally {
  writeLock.lock();
}
  }
{code}
h2. AutoDeletionForExpiredQueuePolicy

I think this policy should be changed a bit. I advise to add two sets:
{code:java}
private Set markedForDeletion = new HashSet<>();
private Set sentForDeletion = new HashSet<>();
{code}
 * markedForDeletion: in each interval, this set is extended by queues that are 
eligible for auto deletion
 * sentForDeletion: if in the next interval, there is queue, that is eligible 
for auto deletion, and is already marked for deletion, move it to this queue
 * if a queue is in markedForDeletion in the next interval, but not moved to 
sentForDeletion, its mark should be reset (therefore clear the 
markedForDeletion)

The logic to invoke in each interval is:
{code:java}
 for (Map.Entry queues : scheduler
.getCapacitySchedulerQueueManager()
.getQueues().entrySet()) {

  String queueName = queues.getKey();
  CSQueue queue = queues.getValue();
  Set newMarks = new HashSet<>();
  if (queue instanceof AbstractCSQueue &&
  ((AbstractCSQueue) queue).isEligibleForAutoDeletion()) {
if (markedForDeletion.contains(queueName)) {
  sentForDeletion.add(queueName);
  markedForDeletion.remove(queueName);
} else {
  newMarks.add(queueName);
}
  }
  markedForDeletion.clear();
  markedForDeletion.addAll(newMarks);
 }

 for (String queueName : sentForDeletion) {
  CSQueue checkQueue =
  scheduler.getCapacitySchedulerQueueManager().
  getQueue(queueName);
  queueAutoDeletionCheck(checkQueue);
 }
 sentForDeletion.clear();
{code}
The monitoringInterval should be set to the value of the queue-expiration-time, 
which you have introduced. 
{code:java}
monitoringInterval = csConfig.getAutoExpiredDeletionTime(); {code}
Also please fix the following items:
 * AutoCreatedQueueDeletionCheckEvent should be renamed to only 
AutoCreatedQueueDeletionEvent
 * AutoCreatedQueueDeletion should be renamed to removeAutoCreatedQueue
 * AutoDeletionForExpiredQueuePolicy should be renamed to 
AutoCreatedQueueDeletionPolicy
 * The config expired-deletion-enabled should be renamed to 
queue-auto-removal.enabled
 * The config expired-deletion-time should be renamed to queue-expiration-time
 * There are a handful of unused imports and methods, which should be removed
 * Please do not use wildcard import in CapacityScheduler (line 42)

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: 

[jira] [Commented] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276427#comment-17276427
 ] 

Hadoop QA commented on YARN-10589:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 19m  
1s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/branch-shadedclient.txt{color}
 | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
29s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01. {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
29s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
23s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
23s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/569/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 23s{color} 
| 

[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-02-01 Thread VADAGA ANANYO RAO (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276425#comment-17276425
 ] 

VADAGA ANANYO RAO commented on YARN-10559:
--

[~epayne], thanks for catching this. This is a major bug in the code. I am 
already working for addressing multiple user scenarios and should be able to 
get a patch to fix this in a couple of days.

> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch, 
> YARN-10559.0005.patch, YARN-10559.0006.patch, YARN-10559.0007.patch, 
> YARN-10559.0008.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10559) Fair sharing intra-queue preemption support in Capacity Scheduler

2021-02-01 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276404#comment-17276404
 ] 

Eric Payne commented on YARN-10559:
---

[~ananyo_rao], sorry for the delay. I'm still reviewing the changes, but I have 
one concern with the requirements.

{code:title=FifoIntraQueuePreemptionPlugin#setFairShareForApps}
+   * we firstly ensure all the apps in the queue get equal resources.
{code}
I don't think this is exactly correct. When a queue in the Capacity Scheduler 
has FairOrderingPolicy set, it will grow each user's share of the resources at 
a fair pace. If user1 has app1 and user2 has app2 and app3, and if all 3 apps 
are requesting resources, app1 will receive resources faster than app2. app2 
and app3 together will receive resources at roughly the same rate as app1.
The total of resources assigned to user1 and user2 will grow at roughly the 
same amount, but the apps themselves will not receive resources at the same 
rate.

So, when we preempt, we want to mimic that same behavior. I'm still trying to 
fully understand the code, so it may be that the code actually does what I 
said, but at the very least, the statement is misleading and I want to make 
sure we are on the same page regarding the requirements.



> Fair sharing intra-queue preemption support in Capacity Scheduler
> -
>
> Key: YARN-10559
> URL: https://issues.apache.org/jira/browse/YARN-10559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.1.4
>Reporter: VADAGA ANANYO RAO
>Assignee: VADAGA ANANYO RAO
>Priority: Major
> Attachments: FairOP_preemption-design_doc_v1.pdf, 
> FairOP_preemption-design_doc_v2.pdf, YARN-10559.0001.patch, 
> YARN-10559.0002.patch, YARN-10559.0003.patch, YARN-10559.0004.patch, 
> YARN-10559.0005.patch, YARN-10559.0006.patch, YARN-10559.0007.patch, 
> YARN-10559.0008.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Usecase:
> Due to the way Capacity Scheduler preemption works, If a single user submits 
> a large application to a queue (using 100% of resources), that job will not 
> be preempted by future applications from the same user within the same queue. 
> This implies that the later applications will be forced to wait for 
> completion of the long running application. This prevents multiple long 
> running, large, applications from running concurrently.
> Support fair sharing among apps while preempting applications from same queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 3:08 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I update a new patch to fix this.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Also , i think the test case is not right, we should add a right test, or this 
may not need a new test at my side.

Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>

[jira] [Updated] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-10589:
-
Attachment: YARN-10589-002.patch

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch, YARN-10589-002.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 3:00 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> 

[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:59 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

I change the code to :
{code:java}
// When the partition not meet the priority it will return
// AllocationState.PARTITION_SKIPPED, this should also skip.
if (allocationState == AllocationState.PRIORITY_SKIPPED
|| allocationState == AllocationState.PARTITION_SKIPPED) {
  continue;
}
{code}
Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if 

[jira] [Commented] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276380#comment-17276380
 ] 

Andras Gyori commented on YARN-10605:
-

Thank you [~shuzirra] [~bteke] for the reviews, I have addressed this problem.

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch, YARN-10605.002.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:48 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

It make sense to me now.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the 

[jira] [Updated] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10605:

Attachment: YARN-10605.002.patch

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch, YARN-10605.002.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10606) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10606:

Parent: (was: YARN-10496)
Issue Type: Bug  (was: Sub-task)

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10606
> URL: https://issues.apache.org/jira/browse/YARN-10606
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:46 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If we change to return ContainerAllocation.PARTITION_SKIPPED in patch, we 
should also skip this priority.

Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and the patch.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:42 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and the patch.

I have reviewed the patch, it break the PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

Any other thoughts?

 

 


was (Author: zhuqi):
Thanks for [~ztang] [~tanu.ajmera] review and patch.

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message 

[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:37 PM:
---

Thanks for [~ztang] [~tanu.ajmera] review and patch.

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 


was (Author: zhuqi):
[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point 

[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:36 PM:
---

[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}


// will skip priority not meet
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 


was (Author: zhuqi):
[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}

if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 

[jira] [Comment Edited] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi edited comment on YARN-10589 at 2/1/21, 2:35 PM:
---

[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}

if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application. 

If this partition not meet this priority, why we can think this partition not 
meet all priorities of the applications.

Any other thoughts?

 

 


was (Author: zhuqi):
[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}

if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276369#comment-17276369
 ] 

zhuqi commented on YARN-10589:
--

[~ztang] [~tanu.ajmera]

I have reviewed the patch, i don't think the logic is needed, and it break the 
PRIORITY_SKIPPED logic.

In Schedule in priority order:
{code:java}
// Schedule in priority order
for (SchedulerRequestKey schedulerKey : application.getSchedulerKeys()) {
  ContainerAllocation result = allocate(clusterResource, candidates,
  schedulingMode, resourceLimits, schedulerKey, null);

  AllocationState allocationState = result.getAllocationState();
  if (allocationState == AllocationState.PRIORITY_SKIPPED) {
continue;
  }
  return getCSAssignmentFromAllocateResult(clusterResource, result,
  null, node);
}

if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(clusterResource, node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
}
{code}
I think the original logic is right, in preCheckForNodeCandidateSet:
{code:java}
// Is the nodePartition of pending request matches the node's partition
// If not match, jump to next priority.
Optional dcOpt = activitiesManager == null ?
Optional.empty() :
activitiesManager.getOptionalDiagnosticsCollector();
if (!appInfo.precheckNode(schedulerKey, node, schedulingMode, dcOpt)) {
  ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
  activitiesManager, node, application, schedulerKey,
  ActivityDiagnosticConstant.
  NODE_DO_NOT_MATCH_PARTITION_OR_PLACEMENT_CONSTRAINTS
  + ActivitiesManager.getDiagnostics(dcOpt),
  ActivityLevel.NODE);
  return ContainerAllocation.PRIORITY_SKIPPED;
}
{code}
We have skipped the priority, and the next priority may need this node 
partition. We just skipped the priority of application.

Any other thoughts?

 

 

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276356#comment-17276356
 ] 

Benjamin Teke commented on YARN-10605:
--

[~gandras] thanks for the patch. I have one minor nit about it: in 
TestFSConfigToCSConfigConverter you introduced an unused import 
{{org.junit.Assert.assertFalse}}. Otherwise the patch LGTM, +1 (non-binding).

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276351#comment-17276351
 ] 

Gergely Pollak commented on YARN-10604:
---

[~gandras]thank you for the patch, LGTM+1 (Non-binding)

> Support auto queue creation without mapping rules
> -
>
> Key: YARN-10604
> URL: https://issues.apache.org/jira/browse/YARN-10604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10604.001.patch
>
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276347#comment-17276347
 ] 

Gergely Pollak commented on YARN-10605:
---

[~gandras]Thank you for the patch! The only thing I've noticed you import 
import static org.junit.Assert.assertFalse, but you don't use it.

Otherwise LGTM+1 (Non-binding)

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276344#comment-17276344
 ] 

Benjamin Teke commented on YARN-10604:
--

[~gandras] thanks for the patch. Looks good to me, +1 (non-binding).

> Support auto queue creation without mapping rules
> -
>
> Key: YARN-10604
> URL: https://issues.apache.org/jira/browse/YARN-10604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10604.001.patch
>
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10606) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori resolved YARN-10606.
-
Resolution: Duplicate

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10606
> URL: https://issues.apache.org/jira/browse/YARN-10606
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10606) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)
Andras Gyori created YARN-10606:
---

 Summary: Add queue-mappings-override.enable property in FS2CS 
conversions
 Key: YARN-10606
 URL: https://issues.apache.org/jira/browse/YARN-10606
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Andras Gyori
Assignee: Andras Gyori


Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
ApplicationPlacementContext is null, which happens, when the mapping rules are 
turned off by:
{noformat}

yarn.scheduler.capacity.queue-mappings-override.enable
false
{noformat}
We should allow the auto queue creation to be taken into consideration without 
disrupting the application submission flow.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10605:

Attachment: YARN-10605.001.patch

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori reassigned YARN-10605:
---

Assignee: Andras Gyori

> Add queue-mappings-override.enable property in FS2CS conversions
> 
>
> Key: YARN-10605
> URL: https://issues.apache.org/jira/browse/YARN-10605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10605.001.patch
>
>
> In Capacity Scheduler the
> {noformat}
> queue-mappings-override.enable
> {noformat}
> property is false by default. As this is not set during an FS2CS conversion, 
> the converted placement rules (aka. mapping rules in CS) are ignored during 
> application submission. We should enable this property in the conversion 
> logic if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10605) Add queue-mappings-override.enable property in FS2CS conversions

2021-02-01 Thread Andras Gyori (Jira)
Andras Gyori created YARN-10605:
---

 Summary: Add queue-mappings-override.enable property in FS2CS 
conversions
 Key: YARN-10605
 URL: https://issues.apache.org/jira/browse/YARN-10605
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Andras Gyori


In Capacity Scheduler the
{noformat}
queue-mappings-override.enable
{noformat}
property is false by default. As this is not set during an FS2CS conversion, 
the converted placement rules (aka. mapping rules in CS) are ignored during 
application submission. We should enable this property in the conversion logic 
if there are placement rules to be converted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276317#comment-17276317
 ] 

Andras Gyori commented on YARN-10604:
-

Uploaded a new patch with the corresponding test case.

> Support auto queue creation without mapping rules
> -
>
> Key: YARN-10604
> URL: https://issues.apache.org/jira/browse/YARN-10604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10604.001.patch
>
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10604:

Attachment: YARN-10604.001.patch

> Support auto queue creation without mapping rules
> -
>
> Key: YARN-10604
> URL: https://issues.apache.org/jira/browse/YARN-10604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10604.001.patch
>
>
> Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
> ApplicationPlacementContext is null, which happens, when the mapping rules 
> are turned off by:
> {noformat}
> 
> yarn.scheduler.capacity.queue-mappings-override.enable
> false
> {noformat}
> We should allow the auto queue creation to be taken into consideration 
> without disrupting the application submission flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10604) Support auto queue creation without mapping rules

2021-02-01 Thread Andras Gyori (Jira)
Andras Gyori created YARN-10604:
---

 Summary: Support auto queue creation without mapping rules
 Key: YARN-10604
 URL: https://issues.apache.org/jira/browse/YARN-10604
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Andras Gyori
Assignee: Andras Gyori


Currently, the Capacity Scheduler skips auto queue creation entirely, if the 
ApplicationPlacementContext is null, which happens, when the mapping rules are 
turned off by:
{noformat}

yarn.scheduler.capacity.queue-mappings-override.enable
false
{noformat}
We should allow the auto queue creation to be taken into consideration without 
disrupting the application submission flow.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10601) The Yarn client should use the UGI who created the Yarn client for obtaining a delegation token for the remote log dir

2021-02-01 Thread Daniel Fritsi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276236#comment-17276236
 ] 

Daniel Fritsi commented on YARN-10601:
--

[~gezapeti], [~asalamon74] do you have any insights on years ago why was 
creating the Yarn client was put into an ugi.doAs block but not the 
submitApplication invocation?

> The Yarn client should use the UGI who created the Yarn client for obtaining 
> a delegation token for the remote log dir
> --
>
> Key: YARN-10601
> URL: https://issues.apache.org/jira/browse/YARN-10601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Daniel Fritsi
>Priority: Critical
>
> It seems there was a bug introduced in YARN-10333 in this section of 
> *{color:#0747A6}{{addLogAggregationDelegationToken}}{color}*:
> {code:java}
> Path remoteRootLogDir = fileController.getRemoteRootLogDir();
> FileSystem fs = remoteRootLogDir.getFileSystem(conf);
> final org.apache.hadoop.security.token.Token[] finalTokens =
> fs.addDelegationTokens(masterPrincipal, credentials);
> {code}
> *{color:#0747A6}{{remoteRootLogDir.getFileSystem}}{color}* simply does this:
> {code:java}
> public FileSystem getFileSystem(Configuration conf) throws IOException {
>   return FileSystem.get(this.toUri(), conf);
> }
> {code}
> As far as I know it's customary to create a YarnClient instance via 
> *{color:#0747A6}{{YarnClient.createYarnClient()}}{color}* in a 
> UserGroupInformation.doAs block if you would like to use it with a different 
> user then the current one. E.g.:
> {code:java}
> YarnClient yarnClient = ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public YarnClient run() throws Exception {
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> return yarnClient;
> }
> });
> {code}
> If this statement is correct then I think YarnClient should save the 
> *{color:#0747A6}{{UserGroupInformation.getCurrentUser()}}{color}* when the 
> YarnClient is being created and the 
> *{color:#0747A6}{{remoteRootLogDir.getFileSystem(conf)}}{color}* call should 
> be made inside an ugi.doAs block with that saved user.
> A more concrete example:
> {code:java}
> public YarnClient createYarnClient(UserGroupInformation ugi, Configuration 
> conf) throws Exception {
> return ugi.doAs((PrivilegedExceptionAction) () -> {
> // Her I am the submitterUser (see below)
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> return yarnClient;
> });
> }
> public void run() {
> // Here I am the serviceUser
> // ...
> Configuration conf = ...
> // ...
> UserGroupInformation ugi = getSubmitterUser();
> // ...
> YarnClient yarnClient = createYarnClient(ugi);
> // ...
> ApplicationSubmissionContext context = ...
> // ...
> yarnClient.submitApplication(context);
> }
> {code}
> As you can see *{color:#0747A6}{{submitApplication}}{color}* is not invoked 
> inside an ugi.doAs block and submitApplication is the one who will eventually 
> invoke *{color:#0747A6}{{addLogAggregationDelegationToken}}{color}*. That's 
> why we need to save the UGI during the YarnClient creation and create the 
> FileSystem instance inside an ugi.doAs with that saved user. Otherwise Yarn 
> will try to get a delegation token with an incorrect user (serviceUser) 
> instead of the submitterUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10589) Improve logic of multi-node allocation

2021-02-01 Thread Zhankun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276170#comment-17276170
 ] 

Zhankun Tang commented on YARN-10589:
-

[~zhuqi], could you please review Tanu's patch too?

> Improve logic of multi-node allocation
> --
>
> Key: YARN-10589
> URL: https://issues.apache.org/jira/browse/YARN-10589
> Project: Hadoop YARN
>  Issue Type: Task
>Affects Versions: 3.3.0
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Major
> Attachments: YARN-10589-001.patch
>
>
> {code:java}
> for (String partititon : partitions) {
>  if (current++ > start) {
>  break;
>  }
>  CandidateNodeSet candidates =
>  cs.getCandidateNodeSet(partititon);
>  if (candidates == null) {
>  continue;
>  }
>  cs.allocateContainersToNode(candidates, false);
> }{code}
> In above logic, if we have thousands of node in one partition, we will still 
> repeatedly access all nodes of the partition thousands of times. There is no 
> break point where if the partition is not same for the first node, it should 
> stop checking other nodes in that partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10603) Failed to reinitialize for recovered container

2021-02-01 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276126#comment-17276126
 ] 

Hadoop QA commented on YARN-10603:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
22s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/565/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 148 unchanged - 0 fixed = 149 total (was 148) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 54s{color} |