date:20220309

[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

2022-03-09 Thread tuyu (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503959#comment-17503959
 ] 

tuyu commented on YARN-11083:
-

the failed testcase it not related this issue. 

> Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed 
> to accept this proposal"
> --
>
> Key: YARN-11083
> URL: https://issues.apache.org/jira/browse/YARN-11083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: tuyu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-11083.001.patch
>
>
> in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
> fail metric will more then 50+ thousand，so, we To reproduce this  case:
> Queue tree:
> {code:java}
>Root max <60G, 100>
> /
>A  max <60G, 100>
>   /\
>  A1   A2
> max<5G,100>   max<40,70>
> {code}
>  
>   Test this situation
>   A2 allocate <30GB, 1>  then A has <30, 99>
>   A1 allocate <10, 1> 
> expected behavior is checkHeadRoom will reject this request,because queue max 
> capacity is <5g，100vcore>.
> but getCurrentLimitResource use DominantResourceCalculator 
> resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is 
> max share， that will cause scheduler thread will allocate <10G, 1vcore> 
> success， but the commit thread  tryCommit  use AbstractCSQueue.accept  
> Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit
> Based on this analysis：
> getCurrentLimitResource return
> {code:java}
> return Resources.componentwiseMin(
>   Resources.min(resourceCalculator, clusterResource,
>   queueMaxResource, currentResourceLimits.getLimit()),
>   queueMaxResource);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10538) Add recommissioning nodes to the list of updated nodes returned to the AM

2022-03-09 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-10538:
-
Fix Version/s: 3.2.3

Backported to branch-3.2 and branch-3.2.3.

> Add recommissioning nodes to the list of updated nodes returned to the AM
> -
>
> Key: YARN-10538
> URL: https://issues.apache.org/jira/browse/YARN-10538
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Srinivas S T
>Assignee: Srinivas S T
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> YARN-6483 introduced nodes that transitioned to DECOMMISSIONING state to the 
> list of updated nodes returned to the AM. This allows the Spark application 
> master to gracefully decommission its containers on the decommissioning node. 
> But if the node were to be recommissioned, the Spark application master would 
> not be aware of this. We propose to add recommissioned node to the list of 
> updated nodes sent to the AM when a recommission node transition occurs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10923) Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense

2022-03-09 Thread Jira



 [ 
https://issues.apache.org/jira/browse/YARN-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

András Győri resolved YARN-10923.
-
Resolution: Won't Fix

> Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent 
> queues makes sense
> -
>
> Key: YARN-10923
> URL: https://issues.apache.org/jira/browse/YARN-10923
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: András Győri
>Priority: Critical
>
> First, create 2 new classes: DynamicLeaf / DynamicParent.
> Then, gradually move AQC functionality from ManagedParentQueue / 
> AutoCreatedLeafQueue.
> Revisit if AbstractManagedParentQueue makes sense at all.
> ManagedParent / Parent: Is there an actual need for the two classes?
> - Currently the two different parents can cause confusion and chaos
> - Can be a “back two the drawing board” task
> The ultimate goal is to have a common class for AQC-enabled parent and 
> investigate if separate class for AutoCreatedLeafQueue is required.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11084) Introduce new config to specify AM default node-label when not specified

2022-03-09 Thread Junfan Zhang (Jira)

Junfan Zhang created YARN-11084:
---

 Summary: Introduce new config to specify AM default node-label 
when not specified
 Key: YARN-11084
 URL: https://issues.apache.org/jira/browse/YARN-11084
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2022-03-09 Thread Masatake Iwasaki (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-9783:
---
Fix Version/s: 3.2.3

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
> Fix For: 3.3.0, 3.2.3
>
> Attachments: YARN-9783.001.patch, YARN-9783.002.patch, 
> YARN-9783.003.patch
>
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2022-03-09 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504051#comment-17504051
 ] 

Masatake Iwasaki commented on YARN-9783:


I cherry-picked this to branch-3.2 and branch-3.2.3.

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Reporter: Mate Szalay-Beko
>Assignee: Mate Szalay-Beko
>Priority: Major
> Fix For: 3.3.0, 3.2.3
>
> Attachments: YARN-9783.001.patch, YARN-9783.002.patch, 
> YARN-9783.003.patch
>
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

2022-03-09 Thread tuyu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tuyu updated YARN-11083:

Description: 
in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
fail metric will more then 50+ thousand，so, we To reproduce this  case:
Queue tree:
   Root max <60G, 100>
/
   A  max <60G, 100>
  /\
 A1   A2
max<5G,100>   max<40,70>
 
  Test this situation
  A2 allocate <30GB, 1>  then A has <30, 99>
  A1 allocate <10, 1> 
expected behavior is checkHeadRoom will reject this request,because queue max 
capacity is <5g，100vcore>.
but getCurrentLimitResource use DominantResourceCalculator 
resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max 
share， that will cause scheduler thread will allocate <10G, 1vcore> success， 
but the commit thread  tryCommit  use AbstractCSQueue.accept  Resources.fitsIn 
check memory and vcore and fail the <10G, 1vcore> commit

Based on this analysis：
getCurrentLimitResource return
{code:java}
return Resources.componentwiseMin(
  Resources.min(resourceCalculator, clusterResource,
  queueMaxResource, currentResourceLimits.getLimit()),
  queueMaxResource);
{code}




> Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed 
> to accept this proposal"
> --
>
> Key: YARN-11083
> URL: https://issues.apache.org/jira/browse/YARN-11083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: tuyu
>Priority: Major
> Fix For: 3.4.0
>
>
> in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
> fail metric will more then 50+ thousand，so, we To reproduce this  case:
> Queue tree:
>Root max <60G, 100>
> /
>A  max <60G, 100>
>   /\
>  A1   A2
> max<5G,100>   max<40,70>
>  
>   Test this situation
>   A2 allocate <30GB, 1>  then A has <30, 99>
>   A1 allocate <10, 1> 
> expected behavior is checkHeadRoom will reject this request,because queue max 
> capacity is <5g，100vcore>.
> but getCurrentLimitResource use DominantResourceCalculator 
> resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is 
> max share， that will cause scheduler thread will allocate <10G, 1vcore> 
> success， but the commit thread  tryCommit  use AbstractCSQueue.accept  
> Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit
> Based on this analysis：
> getCurrentLimitResource return
> {code:java}
> return Resources.componentwiseMin(
>   Resources.min(resourceCalculator, clusterResource,
>   queueMaxResource, currentResourceLimits.getLimit()),
>   queueMaxResource);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

2022-03-09 Thread tuyu (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tuyu updated YARN-11083:

Description: 
in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
fail metric will more then 50+ thousand，so, we To reproduce this  case:
Queue tree:
{code:java}

   Root max <60G, 100>
/
   A  max <60G, 100>
  /\
 A1   A2
max<5G,100>   max<40,70>
{code}

 
  Test this situation
  A2 allocate <30GB, 1>  then A has <30, 99>
  A1 allocate <10, 1> 
expected behavior is checkHeadRoom will reject this request,because queue max 
capacity is <5g，100vcore>.
but getCurrentLimitResource use DominantResourceCalculator 
resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max 
share， that will cause scheduler thread will allocate <10G, 1vcore> success， 
but the commit thread  tryCommit  use AbstractCSQueue.accept  Resources.fitsIn 
check memory and vcore and fail the <10G, 1vcore> commit

Based on this analysis：
getCurrentLimitResource return
{code:java}
return Resources.componentwiseMin(
  Resources.min(resourceCalculator, clusterResource,
  queueMaxResource, currentResourceLimits.getLimit()),
  queueMaxResource);
{code}




  was:
in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
fail metric will more then 50+ thousand，so, we To reproduce this  case:
Queue tree:
   Root max <60G, 100>
/
   A  max <60G, 100>
  /\
 A1   A2
max<5G,100>   max<40,70>
 
  Test this situation
  A2 allocate <30GB, 1>  then A has <30, 99>
  A1 allocate <10, 1> 
expected behavior is checkHeadRoom will reject this request,because queue max 
capacity is <5g，100vcore>.
but getCurrentLimitResource use DominantResourceCalculator 
resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max 
share， that will cause scheduler thread will allocate <10G, 1vcore> success， 
but the commit thread  tryCommit  use AbstractCSQueue.accept  Resources.fitsIn 
check memory and vcore and fail the <10G, 1vcore> commit

Based on this analysis：
getCurrentLimitResource return
{code:java}
return Resources.componentwiseMin(
  Resources.min(resourceCalculator, clusterResource,
  queueMaxResource, currentResourceLimits.getLimit()),
  queueMaxResource);
{code}





> Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed 
> to accept this proposal"
> --
>
> Key: YARN-11083
> URL: https://issues.apache.org/jira/browse/YARN-11083
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: tuyu
>Priority: Major
> Fix For: 3.4.0
>
>
> in our cluster: 6k+ node,  600+ queues， when cluster is very busy, the commit 
> fail metric will more then 50+ thousand，so, we To reproduce this  case:
> Queue tree:
> {code:java}
>Root max <60G, 100>
> /
>A  max <60G, 100>
>   /\
>  A1   A2
> max<5G,100>   max<40,70>
> {code}
>  
>   Test this situation
>   A2 allocate <30GB, 1>  then A has <30, 99>
>   A1 allocate <10, 1> 
> expected behavior is checkHeadRoom will reject this request,because queue max 
> capacity is <5g，100vcore>.
> but getCurrentLimitResource use DominantResourceCalculator 
> resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is 
> max share， that will cause scheduler thread will allocate <10G, 1vcore> 
> success， but the commit thread  tryCommit  use AbstractCSQueue.accept  
> Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit
> Based on this analysis：
> getCurrentLimitResource return
> {code:java}
> return Resources.componentwiseMin(
>   Resources.min(resourceCalculator, clusterResource,
>   queueMaxResource, currentResourceLimits.getLimit()),
>   queueMaxResource);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

2022-03-09 Thread tuyu (Jira)

tuyu created YARN-11083:
---

 Summary: Wrong ResourceLimit calc logic when use DRC comparator 
cause too many "Failed to accept this proposal"
 Key: YARN-11083
 URL: https://issues.apache.org/jira/browse/YARN-11083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.1.0
Reporter: tuyu
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

2022-03-09 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503707#comment-17503707
 ] 

Hadoop QA commented on YARN-11083:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m  
4s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 37m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 37s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 23m 
21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m  
9s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1276/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 7 new + 72 unchanged - 0 fixed = 79 total (was 72) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  1s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} |

[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2022-03-09 Thread yangben (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503565#comment-17503565
 ] 

yangben commented on YARN-10259:


RegularContainerAllocator#allocate fixes the problem that No new Allocation 
happens, but add the time to finish allocate for 
ResourceUsageMultiNodeLookupPolicy(Because if the first node can't allocate, 
other nodes are also). I think we can add a switch for different policies. When 
the policy is ResourceUsageMultiNodeLookupPolicy doesn't continue to lookup 
down to reduce allocate time .

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> Attached testcase which reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue

2022-03-09 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10918:
--
Fix Version/s: 3.4.0

> Simplify method: CapacitySchedulerQueueManager#parseQueue
> -
>
> Key: YARN-10918
> URL: https://issues.apache.org/jira/browse/YARN-10918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Ideas for simplifying this method:
> - Define a queue factory
> - Separate validation logic



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10945) Add javadoc to all methods of AbstractCSQueue

2022-03-09 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10945:
--
Fix Version/s: 3.4.0

> Add javadoc to all methods of AbstractCSQueue
> -
>
> Key: YARN-10945
> URL: https://issues.apache.org/jira/browse/YARN-10945
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

[jira] [Updated] (YARN-10538) Add recommissioning nodes to the list of updated nodes returned to the AM

[jira] [Resolved] (YARN-10923) Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense

[jira] [Created] (YARN-11084) Introduce new config to specify AM default node-label when not specified

[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

[jira] [Created] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"

[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue

[jira] [Updated] (YARN-10945) Add javadoc to all methods of AbstractCSQueue

13 matches

Site Navigation

Mail list logo

Footer information