[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"
[ https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503959#comment-17503959 ] tuyu commented on YARN-11083: - the failed testcase it not related this issue. > Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed > to accept this proposal" > -- > > Key: YARN-11083 > URL: https://issues.apache.org/jira/browse/YARN-11083 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: tuyu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-11083.001.patch > > > in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit > fail metric will more then 50+ thousand,so, we To reproduce this case: > Queue tree: > {code:java} >Root max <60G, 100> > / >A max <60G, 100> > /\ > A1 A2 > max<5G,100> max<40,70> > {code} > > Test this situation > A2 allocate <30GB, 1> then A has <30, 99> > A1 allocate <10, 1> > expected behavior is checkHeadRoom will reject this request,because queue max > capacity is <5g,100vcore>. > but getCurrentLimitResource use DominantResourceCalculator > resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is > max share, that will cause scheduler thread will allocate <10G, 1vcore> > success, but the commit thread tryCommit use AbstractCSQueue.accept > Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit > Based on this analysis: > getCurrentLimitResource return > {code:java} > return Resources.componentwiseMin( > Resources.min(resourceCalculator, clusterResource, > queueMaxResource, currentResourceLimits.getLimit()), > queueMaxResource); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10538) Add recommissioning nodes to the list of updated nodes returned to the AM
[ https://issues.apache.org/jira/browse/YARN-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-10538: - Fix Version/s: 3.2.3 Backported to branch-3.2 and branch-3.2.3. > Add recommissioning nodes to the list of updated nodes returned to the AM > - > > Key: YARN-10538 > URL: https://issues.apache.org/jira/browse/YARN-10538 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.9.1, 3.1.1 >Reporter: Srinivas S T >Assignee: Srinivas S T >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.1, 3.2.3 > > Time Spent: 2h > Remaining Estimate: 0h > > YARN-6483 introduced nodes that transitioned to DECOMMISSIONING state to the > list of updated nodes returned to the AM. This allows the Spark application > master to gracefully decommission its containers on the decommissioning node. > But if the node were to be recommissioned, the Spark application master would > not be aware of this. We propose to add recommissioned node to the list of > updated nodes sent to the AM when a recommission node transition occurs. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10923) Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense
[ https://issues.apache.org/jira/browse/YARN-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] András Győri resolved YARN-10923. - Resolution: Won't Fix > Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent > queues makes sense > - > > Key: YARN-10923 > URL: https://issues.apache.org/jira/browse/YARN-10923 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: András Győri >Priority: Critical > > First, create 2 new classes: DynamicLeaf / DynamicParent. > Then, gradually move AQC functionality from ManagedParentQueue / > AutoCreatedLeafQueue. > Revisit if AbstractManagedParentQueue makes sense at all. > ManagedParent / Parent: Is there an actual need for the two classes? > - Currently the two different parents can cause confusion and chaos > - Can be a “back two the drawing board” task > The ultimate goal is to have a common class for AQC-enabled parent and > investigate if separate class for AutoCreatedLeafQueue is required. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11084) Introduce new config to specify AM default node-label when not specified
Junfan Zhang created YARN-11084: --- Summary: Introduce new config to specify AM default node-label when not specified Key: YARN-11084 URL: https://issues.apache.org/jira/browse/YARN-11084 Project: Hadoop YARN Issue Type: New Feature Reporter: Junfan Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5
[ https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-9783: --- Fix Version/s: 3.2.3 > Remove low-level zookeeper test to be able to build Hadoop against zookeeper > 3.5.5 > -- > > Key: YARN-9783 > URL: https://issues.apache.org/jira/browse/YARN-9783 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Reporter: Mate Szalay-Beko >Assignee: Mate Szalay-Beko >Priority: Major > Fix For: 3.3.0, 3.2.3 > > Attachments: YARN-9783.001.patch, YARN-9783.002.patch, > YARN-9783.003.patch > > > ZooKeeper 3.5.5 release is the latest stable one. It contains many new > features (including SSL related improvements which are very important for > production use; see [the release > notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there > should be no backward incompatible changes on the API, so the applications > using ZooKeeper clients should be built against the new zookeeper without any > problem and the new ZooKeeper client should work with the older (3.4) servers > without any issue, at least until someone is start to use new functionality. > The aim of this ticket is not to change the ZooKeeper version used by Hadoop > YARN yet, but to enable people to rebuild and test Hadoop with the new > ZooKeeper version. > Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN > test case: > [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64]. > This test case seems to use low-level ZooKeeper internal code, which changed > in the new ZooKeeper version. Although I am not sure what was the original > reasoning of the inclusion of this test in the YARN code, I propose to remove > it, and if there is still any missing test case in ZooKeeper, then let's > issue a ZooKeeper ticket to test this scenario there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5
[ https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504051#comment-17504051 ] Masatake Iwasaki commented on YARN-9783: I cherry-picked this to branch-3.2 and branch-3.2.3. > Remove low-level zookeeper test to be able to build Hadoop against zookeeper > 3.5.5 > -- > > Key: YARN-9783 > URL: https://issues.apache.org/jira/browse/YARN-9783 > Project: Hadoop YARN > Issue Type: Improvement > Components: test >Reporter: Mate Szalay-Beko >Assignee: Mate Szalay-Beko >Priority: Major > Fix For: 3.3.0, 3.2.3 > > Attachments: YARN-9783.001.patch, YARN-9783.002.patch, > YARN-9783.003.patch > > > ZooKeeper 3.5.5 release is the latest stable one. It contains many new > features (including SSL related improvements which are very important for > production use; see [the release > notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there > should be no backward incompatible changes on the API, so the applications > using ZooKeeper clients should be built against the new zookeeper without any > problem and the new ZooKeeper client should work with the older (3.4) servers > without any issue, at least until someone is start to use new functionality. > The aim of this ticket is not to change the ZooKeeper version used by Hadoop > YARN yet, but to enable people to rebuild and test Hadoop with the new > ZooKeeper version. > Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN > test case: > [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64]. > This test case seems to use low-level ZooKeeper internal code, which changed > in the new ZooKeeper version. Although I am not sure what was the original > reasoning of the inclusion of this test in the YARN code, I propose to remove > it, and if there is still any missing test case in ZooKeeper, then let's > issue a ZooKeeper ticket to test this scenario there. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"
[ https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tuyu updated YARN-11083: Description: in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit fail metric will more then 50+ thousand,so, we To reproduce this case: Queue tree: Root max <60G, 100> / A max <60G, 100> /\ A1 A2 max<5G,100> max<40,70> Test this situation A2 allocate <30GB, 1> then A has <30, 99> A1 allocate <10, 1> expected behavior is checkHeadRoom will reject this request,because queue max capacity is <5g,100vcore>. but getCurrentLimitResource use DominantResourceCalculator resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max share, that will cause scheduler thread will allocate <10G, 1vcore> success, but the commit thread tryCommit use AbstractCSQueue.accept Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit Based on this analysis: getCurrentLimitResource return {code:java} return Resources.componentwiseMin( Resources.min(resourceCalculator, clusterResource, queueMaxResource, currentResourceLimits.getLimit()), queueMaxResource); {code} > Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed > to accept this proposal" > -- > > Key: YARN-11083 > URL: https://issues.apache.org/jira/browse/YARN-11083 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: tuyu >Priority: Major > Fix For: 3.4.0 > > > in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit > fail metric will more then 50+ thousand,so, we To reproduce this case: > Queue tree: >Root max <60G, 100> > / >A max <60G, 100> > /\ > A1 A2 > max<5G,100> max<40,70> > > Test this situation > A2 allocate <30GB, 1> then A has <30, 99> > A1 allocate <10, 1> > expected behavior is checkHeadRoom will reject this request,because queue max > capacity is <5g,100vcore>. > but getCurrentLimitResource use DominantResourceCalculator > resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is > max share, that will cause scheduler thread will allocate <10G, 1vcore> > success, but the commit thread tryCommit use AbstractCSQueue.accept > Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit > Based on this analysis: > getCurrentLimitResource return > {code:java} > return Resources.componentwiseMin( > Resources.min(resourceCalculator, clusterResource, > queueMaxResource, currentResourceLimits.getLimit()), > queueMaxResource); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"
[ https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tuyu updated YARN-11083: Description: in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit fail metric will more then 50+ thousand,so, we To reproduce this case: Queue tree: {code:java} Root max <60G, 100> / A max <60G, 100> /\ A1 A2 max<5G,100> max<40,70> {code} Test this situation A2 allocate <30GB, 1> then A has <30, 99> A1 allocate <10, 1> expected behavior is checkHeadRoom will reject this request,because queue max capacity is <5g,100vcore>. but getCurrentLimitResource use DominantResourceCalculator resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max share, that will cause scheduler thread will allocate <10G, 1vcore> success, but the commit thread tryCommit use AbstractCSQueue.accept Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit Based on this analysis: getCurrentLimitResource return {code:java} return Resources.componentwiseMin( Resources.min(resourceCalculator, clusterResource, queueMaxResource, currentResourceLimits.getLimit()), queueMaxResource); {code} was: in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit fail metric will more then 50+ thousand,so, we To reproduce this case: Queue tree: Root max <60G, 100> / A max <60G, 100> /\ A1 A2 max<5G,100> max<40,70> Test this situation A2 allocate <30GB, 1> then A has <30, 99> A1 allocate <10, 1> expected behavior is checkHeadRoom will reject this request,because queue max capacity is <5g,100vcore>. but getCurrentLimitResource use DominantResourceCalculator resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is max share, that will cause scheduler thread will allocate <10G, 1vcore> success, but the commit thread tryCommit use AbstractCSQueue.accept Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit Based on this analysis: getCurrentLimitResource return {code:java} return Resources.componentwiseMin( Resources.min(resourceCalculator, clusterResource, queueMaxResource, currentResourceLimits.getLimit()), queueMaxResource); {code} > Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed > to accept this proposal" > -- > > Key: YARN-11083 > URL: https://issues.apache.org/jira/browse/YARN-11083 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: tuyu >Priority: Major > Fix For: 3.4.0 > > > in our cluster: 6k+ node, 600+ queues, when cluster is very busy, the commit > fail metric will more then 50+ thousand,so, we To reproduce this case: > Queue tree: > {code:java} >Root max <60G, 100> > / >A max <60G, 100> > /\ > A1 A2 > max<5G,100> max<40,70> > {code} > > Test this situation > A2 allocate <30GB, 1> then A has <30, 99> > A1 allocate <10, 1> > expected behavior is checkHeadRoom will reject this request,because queue max > capacity is <5g,100vcore>. > but getCurrentLimitResource use DominantResourceCalculator > resourceCalculator.min will return resouceLimit == <30G,99>. because cpu is > max share, that will cause scheduler thread will allocate <10G, 1vcore> > success, but the commit thread tryCommit use AbstractCSQueue.accept > Resources.fitsIn check memory and vcore and fail the <10G, 1vcore> commit > Based on this analysis: > getCurrentLimitResource return > {code:java} > return Resources.componentwiseMin( > Resources.min(resourceCalculator, clusterResource, > queueMaxResource, currentResourceLimits.getLimit()), > queueMaxResource); > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"
tuyu created YARN-11083: --- Summary: Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal" Key: YARN-11083 URL: https://issues.apache.org/jira/browse/YARN-11083 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.1.0 Reporter: tuyu Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11083) Wrong ResourceLimit calc logic when use DRC comparator cause too many "Failed to accept this proposal"
[ https://issues.apache.org/jira/browse/YARN-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503707#comment-17503707 ] Hadoop QA commented on YARN-11083: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m 4s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 37m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 37s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 23m 21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 9s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1276/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 72 unchanged - 0 fixed = 79 total (was 72) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 1s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} |
[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement
[ https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503565#comment-17503565 ] yangben commented on YARN-10259: RegularContainerAllocator#allocate fixes the problem that No new Allocation happens, but add the time to finish allocate for ResourceUsageMultiNodeLookupPolicy(Because if the first node can't allocate, other nodes are also). I think we can add a switch for different policies. When the policy is ResourceUsageMultiNodeLookupPolicy doesn't continue to lookup down to reduce allocate time . > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement > --- > > Key: YARN-10259 > URL: https://issues.apache.org/jira/browse/YARN-10259 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10259-001.patch, YARN-10259-002.patch, > YARN-10259-003.patch > > > Reserved Containers are not allocated from the available space of other nodes > in CandidateNodeSet in MultiNodePlacement. > *Repro:* > 1. MultiNode Placement Enabled. > 2. Two nodes h1 and h2 with 8GB > 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets > placed in h2. > 4. Submit app3 AM which is reserved in h1 > 5. Kill app2 which frees space in h2. > 6. app3 AM never gets ALLOCATED > RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on > h2 as it expects the assignment to be on same node where reservation has > happened. > {code} > 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt > appattempt_1588684773609_0003_01 reserved container > container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 > available= used=. This attempt > currently has 1 reserved containers at priority 0; currentReservation > > 2020-05-05 18:49:37,264 INFO [AsyncDispatcher event handler] > fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved > container=container_1588684773609_0003_01_01, on node=host: h1:1234 > #containers=1 available= used= > with resource= >RESERVED=[(Application=appattempt_1588684773609_0003_01; > Node=h1:1234; Resource=)] > > 2020-05-05 18:49:38,283 DEBUG [Time-limited test] > allocator.RegularContainerAllocator > (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: > node=h2 application=application_1588684773609_0003 priority=0 > pendingAsk=,repeat=1> > type=OFF_SWITCH > 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate > from reserved container container_1588684773609_0003_01_01, but node is > not reserved >ALLOCATED=[(Application=appattempt_1588684773609_0003_01; > Node=h2:1234; Resource=)] > {code} > Attached testcase which reproduces the issue. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue
[ https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10918: -- Fix Version/s: 3.4.0 > Simplify method: CapacitySchedulerQueueManager#parseQueue > - > > Key: YARN-10918 > URL: https://issues.apache.org/jira/browse/YARN-10918 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Ideas for simplifying this method: > - Define a queue factory > - Separate validation logic -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10945) Add javadoc to all methods of AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10945: -- Fix Version/s: 3.4.0 > Add javadoc to all methods of AbstractCSQueue > - > > Key: YARN-10945 > URL: https://issues.apache.org/jira/browse/YARN-10945 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: András Győri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org