[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988937#comment-16988937 ] Eric Payne commented on YARN-9992: -- Thanks [~jhung] , I verified that backporting YARN-9205 fixed the problem. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987270#comment-16987270 ] Jonathan Hung commented on YARN-9992: - Hmm, not sure how I missed this before, I think it's related to YARN-9205. Let me try porting that. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987105#comment-16987105 ] Eric Payne commented on YARN-9992: -- The code changes look fine, but I'm still trying to understand what is different between trunk and branch-2. These code changes are not in trunk, but something is picking up the resource-types.xml in the CS init path. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986454#comment-16986454 ] Eric Payne commented on YARN-9992: -- [~jhung], it looks like this is only a problem on branch-2 and branch-2.10. Is that your analysis as well? > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986312#comment-16986312 ] Eric Payne commented on YARN-9992: -- Thanks [~jhung] for reporting this issue and putting up a patch. I encountered this problem as well. I'll take a look at the patch soon. > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983125#comment-16983125 ] Hadoop QA commented on YARN-9992: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 102 unchanged - 0 fixed = 103 total (was 102) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9992 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12986851/YARN-9992.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux dae86372c6c0 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ef950b0 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25225/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25225/testReport/ | | Max. process+th
[jira] [Commented] (YARN-9992) Max allocation per queue is zero for custom resource types on RM startup
[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983036#comment-16983036 ] Jonathan Hung commented on YARN-9992: - Attached a simple one liner [^YARN-9992.001.patch] > Max allocation per queue is zero for custom resource types on RM startup > > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org