[jira] [Assigned] (YARN-10875) CLI queue usage command only reflects default partition usage
[ https://issues.apache.org/jira/browse/YARN-10875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy reassigned YARN-10875: --- Assignee: D M Murali Krishna Reddy > CLI queue usage command only reflects default partition usage > - > > Key: YARN-10875 > URL: https://issues.apache.org/jira/browse/YARN-10875 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rajshree Mishra >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: queueA_scheduler.png, queueA_usage.png > > > Test step: > # Hadoop cluster with nodelabels -> default, label1 > # Job is submitted to queueA using resources of accessible nodelabel label1 > # Check queue usage for queueA using CLI command "yarn queue -status queueA" > Output: Current capacity is displayed as 00% > Expected: queueA is being utilized under label1 resource pool, and status > command should reflect the same. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10876) CLI queue usage should indicate absolute usage
[ https://issues.apache.org/jira/browse/YARN-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy reassigned YARN-10876: --- Assignee: D M Murali Krishna Reddy > CLI queue usage should indicate absolute usage > -- > > Key: YARN-10876 > URL: https://issues.apache.org/jira/browse/YARN-10876 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rajshree Mishra >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: schedulerUsage.png, usageCLI.png > > > For large cluster with multiple users, WebUI proves to be very slow. > Users use the CLI to check the usage information, however the output displays > percentages above 100. > Users wants to know the available resources to judge if more jobs can be > submitted and these percentages don't give a clear picture about this > information. > CLI output should be made more user friendly to provide information about > used and available resources in a queue, as user may not know total resource > of a large cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9875) FSSchedulerConfigurationStore fails to update with hdfs path
[ https://issues.apache.org/jira/browse/YARN-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-9875: - Labels: pull-request-available (was: ) > FSSchedulerConfigurationStore fails to update with hdfs path > > > Key: YARN-9875 > URL: https://issues.apache.org/jira/browse/YARN-9875 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: YARN-9875-001.patch, YARN-9875-002.patch > > Time Spent: 10m > Remaining Estimate: 0h > > FSSchedulerConfigurationStore fails to update with hdfs path - > "java.io.IOException: Filesystem closed" > *RM Logs:* > {code} > 2019-10-06 16:50:40,829 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FSSchedulerConfigurationStore: > write temp capacity configuration fail, > schedulerConfigFile=hdfs:/tmp/yarn/system/capacity-scheduler.xml.1570380640828.tmp > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:475) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1232) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1214) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1196) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1134) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:527) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:541) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:468) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1136) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1116) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1005) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:993) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FSSchedulerConfigurationStore.writeTmpConfig(FSSchedulerConfigurationStore.java:251) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.FSSchedulerConfigurationStore.logMutation(FSSchedulerConfigurationStore.java:130) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.MutableCSConfigurationProvider.logAndApplyMutation(MutableCSConfigurationProvider.java:153) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2597) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2587) > {code} > *Repro:* > {code:java} > yarn-site.xml: > > yarn.scheduler.configuration.fs.path > hdfs:///tmp/yarn/system > > > yarn.scheduler.configuration.store.class > fs > > [yarn@yarndocker-1 yarn]$ cat /tmp/abc.xml > > > root.default > > > priority > 10 > > > > > [yarn@yarndocker-1 yarn]$ curl -v -X PUT -d @/tmp/abc.xml -H "Content-type: > application/xml" > 'http://yarndocker-1:8088/ws/v1/cluster/scheduler-conf?user.name=yarn' > Filesystem closed > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8992) Fair scheduler can delete a dynamic queue while an application attempt is being added to the queue
[ https://issues.apache.org/jira/browse/YARN-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved YARN-8992. - Fix Version/s: 3.2.3 Resolution: Fixed Backported to branch-3.2. > Fair scheduler can delete a dynamic queue while an application attempt is > being added to the queue > -- > > Key: YARN-8992 > URL: https://issues.apache.org/jira/browse/YARN-8992 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available, release-blocker > Fix For: 3.2.3, 3.3.0 > > Attachments: YARN-8992.001.patch, YARN-8992.002.patch > > Time Spent: 40m > Remaining Estimate: 0h > > As discovered in YARN-8990, QueueManager can see a leaf queue being empty > while FSLeafQueue.addApp() is called in the middle of > {code:java} > return queue.getNumRunnableApps() == 0 && > leafQueue.getNumNonRunnableApps() == 0 && > leafQueue.getNumAssignedApps() == 0;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10663) Add runningApps stats in SLS
[ https://issues.apache.org/jira/browse/YARN-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392534#comment-17392534 ] Steve Loughran commented on YARN-10663: --- FYI. there's refs to com.google in the code which MUST be org.apache.thirdparty.com.google; you do a build right now and maven ends up patching TestNMSimulator > Add runningApps stats in SLS > > > Key: YARN-10663 > URL: https://issues.apache.org/jira/browse/YARN-10663 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: VADAGA ANANYO RAO >Assignee: VADAGA ANANYO RAO >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10663.0001.patch, YARN-10663.0002.patch > > > RMNodes in SLS don't keep a track of runningApps on each node. Due to this, > graceful decommissioning logic takes a hit as the nodes will decommission if > there are no running containers on the node but some shuffle data was present > on the node. > In this Jira, we will add runningApps functionality in SLS for improving > decommissioning logic of each node. This will help with autoscaling > simulations on SLS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10547) Decouple job parsing logic from SLSRunner
[ https://issues.apache.org/jira/browse/YARN-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392336#comment-17392336 ] Benjamin Teke commented on YARN-10547: -- [~snemeth], Thanks for the patch, it seems to me you've adressed [~gandras]'s points and I've nothing else to add so +1 (non-binding) from my side. > Decouple job parsing logic from SLSRunner > - > > Key: YARN-10547 > URL: https://issues.apache.org/jira/browse/YARN-10547 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-10547.001.patch, YARN-10547.002.patch, > YARN-10547.003.patch, YARN-10547.004.patch, YARN-10547.005.patch > > > SLSRunner has too many responsibilities. > One of them is to parse the job details from the SLS input formats and launch > the AMs and task containers. > As a first step, the job parser logic could be decoupled from this class. > There are 3 types of inputs: > - SLS trace > - Synth > - Rumen > Their job parsing method are: > - SLS trace: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526 > - Synth: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790 > - Rumen: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10355) Refactor NM ContainerLaunch.java#orderEnvByDependencies
[ https://issues.apache.org/jira/browse/YARN-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392333#comment-17392333 ] Szilard Nemeth commented on YARN-10355: --- Thanks [~tdomok], YARN-10874 was just merged now. > Refactor NM ContainerLaunch.java#orderEnvByDependencies > --- > > Key: YARN-10355 > URL: https://issues.apache.org/jira/browse/YARN-10355 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Benjamin Teke >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > The > {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch#orderEnvByDependencies}} > and it's helper method \{{getEnvDependencies }}(together with the overrides) > is hard to read. Some improvements could be made: > * use Pattern matching in the overrides of getEnvDependencies instead of > iterating through the environmental variable strings char by char > * the unit tests contains a lot of repeated code and generally the test > methods are long - they could be separated into different setup/helper and > assertion methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10874) Refactor NM ContainerLaunch#getEnvDependencies's unit tests
[ https://issues.apache.org/jira/browse/YARN-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10874. --- Hadoop Flags: Reviewed Resolution: Fixed > Refactor NM ContainerLaunch#getEnvDependencies's unit tests > --- > > Key: YARN-10874 > URL: https://issues.apache.org/jira/browse/YARN-10874 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The YARN-10355 ticket states that the unit tests contains repeated code and > the test methods are too long. We decided to split that ticket into two > parts. The YARN-10355 will contain only the production code change (for the > windows variant, the linux variant refactor is not feasible with regex, the > original code is not the nicest, but it does it's thing). > > Acceptance criteria: > * refactor the unit tests (e.g.: parameterised tests) > * extend the tests with extra checks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10874) Refactor NM ContainerLaunch#getEnvDependencies's unit tests
[ https://issues.apache.org/jira/browse/YARN-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10874: -- Fix Version/s: 3.4.0 > Refactor NM ContainerLaunch#getEnvDependencies's unit tests > --- > > Key: YARN-10874 > URL: https://issues.apache.org/jira/browse/YARN-10874 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The YARN-10355 ticket states that the unit tests contains repeated code and > the test methods are too long. We decided to split that ticket into two > parts. The YARN-10355 will contain only the production code change (for the > windows variant, the linux variant refactor is not feasible with regex, the > original code is not the nicest, but it does it's thing). > > Acceptance criteria: > * refactor the unit tests (e.g.: parameterised tests) > * extend the tests with extra checks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9509) Capped cpu usage with cgroup strict-resource-usage based on a mulitplier
[ https://issues.apache.org/jira/browse/YARN-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-9509: - Labels: pull-request-available (was: ) > Capped cpu usage with cgroup strict-resource-usage based on a mulitplier > > > Key: YARN-9509 > URL: https://issues.apache.org/jira/browse/YARN-9509 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Nicolas Fraison >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add a multiplier configuration on strict resource usage to authorize > container to use spare cpu up to a limit. > Currently with strict resource usage you can't get more than what you request > which is sometime not good for jobs that doesn't have a constant usage of cpu > (for ex. spark jobs with multiple stages). > But without strict resource usage we have seen some bad behaviour from our > users that don't tune at all their needs and it leads to some containers > requesting 2 vcore but constantly using 20. > The idea here is to still authorize containers to get more cpu than what they > request if some are free but also to avoid too big differencies so SLA on > jobs is not breached if the cluster is full (at least increase of runtime is > contain) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9509) Capped cpu usage with cgroup strict-resource-usage based on a mulitplier
[ https://issues.apache.org/jira/browse/YARN-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392324#comment-17392324 ] Hadoop QA commented on YARN-9509: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 21s{color} | | {color:red} https://github.com/apache/hadoop/pull/766 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | GITHUB PR | https://github.com/apache/hadoop/pull/766 | | JIRA Issue | YARN-9509 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-766/1/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Capped cpu usage with cgroup strict-resource-usage based on a mulitplier > > > Key: YARN-9509 > URL: https://issues.apache.org/jira/browse/YARN-9509 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Nicolas Fraison >Priority: Minor > > Add a multiplier configuration on strict resource usage to authorize > container to use spare cpu up to a limit. > Currently with strict resource usage you can't get more than what you request > which is sometime not good for jobs that doesn't have a constant usage of cpu > (for ex. spark jobs with multiple stages). > But without strict resource usage we have seen some bad behaviour from our > users that don't tune at all their needs and it leads to some containers > requesting 2 vcore but constantly using 20. > The idea here is to still authorize containers to get more cpu than what they > request if some are free but also to avoid too big differencies so SLA on > jobs is not breached if the cluster is full (at least increase of runtime is > contain) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10552) Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler
[ https://issues.apache.org/jira/browse/YARN-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392096#comment-17392096 ] Siddharth Ahuja edited comment on YARN-10552 at 8/3/21, 8:35 AM: - Hey [~snemeth], thanks for the updates! Please don't bother with 1. as I am being pedantic. For 5. & 6. please go ahead and file a separate JIRA/s. Should be good to have the patch committed after those checkstyle and whitespace issues are fixed. was (Author: sahuja): Hey [~snemeth], thanks for the updates! For 1. please don't worry. For 5. & 6. please go ahead and file a separate JIRA/s. Should be good to have the patch committed after those checkstyle and whitespace issues are fixed. > Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler > --- > > Key: YARN-10552 > URL: https://issues.apache.org/jira/browse/YARN-10552 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-10552.001.patch, YARN-10552.002.patch, > YARN-10552.003.patch, YARN-10552.004.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10552) Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler
[ https://issues.apache.org/jira/browse/YARN-10552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392096#comment-17392096 ] Siddharth Ahuja commented on YARN-10552: Hey [~snemeth], thanks for the updates! For 1. please don't worry. For 5. & 6. please go ahead and file a separate JIRA/s. Should be good to have the patch committed after those checkstyle and whitespace issues are fixed. > Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler > --- > > Key: YARN-10552 > URL: https://issues.apache.org/jira/browse/YARN-10552 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-10552.001.patch, YARN-10552.002.patch, > YARN-10552.003.patch, YARN-10552.004.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org