[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198511#comment-17198511 ] Dongwook Kwon commented on YARN-10428: -- Thanks Guang Yang for the answer. >> Hi Wenning Ding: as far as I know, mag is supposed to be non-negative unless >> there's a bug. if so, doesn't the change like this better in order to clarify the intention? {code:java} double mag = r.getSchedulingResourceUsage().getCachedUsed(CommonNodeLabelsManager.ANY).getMemorySize(); if (sizeBasedWeight && mag > 0) { double weight = Math.log1p(r.getSchedulingResourceUsage().getCachedDemand( CommonNodeLabelsManager.ANY).getMemorySize()) / Math.log(2); mag = mag / weight; } return Math.max(mag, 0);{code} Also current change will work only with the assumption of getCachedUsed and getCachedDemand both return zero at the same time, otherwise, if getCachedUsed returns non-zero but getCachedDemand returns zero, mag will be NaN. Since Math.log1p(0) / Math.log(2) is zero, mag will be divided by zero in this case. I think it's better to check these precondition in this method and return value accordingly than just assume. > Zombie applications in the YARN queue using FAIR + sizebasedweight > -- > > Key: YARN-10428 > URL: https://issues.apache.org/jira/browse/YARN-10428 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.5 >Reporter: Guang Yang >Priority: Major > Attachments: YARN-10428.001.patch, YARN-10428.002.patch > > > Seeing zombie jobs in the YARN queue that uses FAIR and size based weight > ordering policy . > *Detection:* > The YARN UI shows incorrect number of "Num Schedulable Applications". > *Impact:* > The queue has an upper limit of number of running applications, with zombie > job, it hits the limit even though the number of running applications is far > less than the limit. > *Workaround:* > **Fail-over and restart Resource Manager process. > *Analysis:* > **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy# > schedulableEntities` (see attachment). Take application > "application_1599157165858_29429" for example, it is still in the > `FairOderingPolicy#schedulableEntities` set, however, if we check the log of > resource manager, we can see RM already tried to remove the application: > > ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 > 04:32:19,730 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Application removed - appId: > application_1599157165858_29429 user: svc_di_data_eng queue: core-data > #user-pending-applications: -3 #user-active-applications: 7 > #queue-pending-applications: 0 #queue-active-applications: 21 > > So it appears RM failed to removed the application from the set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198478#comment-17198478 ] Dongwook Kwon commented on YARN-10428: -- I wonder whether {code:java} // code placeholder {code} {code:java} if (sizeBasedWeight && mag != 0) {code} > Zombie applications in the YARN queue using FAIR + sizebasedweight > -- > > Key: YARN-10428 > URL: https://issues.apache.org/jira/browse/YARN-10428 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.5 >Reporter: Guang Yang >Priority: Major > Attachments: YARN-10428.001.patch, YARN-10428.002.patch > > > Seeing zombie jobs in the YARN queue that uses FAIR and size based weight > ordering policy . > *Detection:* > The YARN UI shows incorrect number of "Num Schedulable Applications". > *Impact:* > The queue has an upper limit of number of running applications, with zombie > job, it hits the limit even though the number of running applications is far > less than the limit. > *Workaround:* > **Fail-over and restart Resource Manager process. > *Analysis:* > **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy# > schedulableEntities` (see attachment). Take application > "application_1599157165858_29429" for example, it is still in the > `FairOderingPolicy#schedulableEntities` set, however, if we check the log of > resource manager, we can see RM already tried to remove the application: > > ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 > 04:32:19,730 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Application removed - appId: > application_1599157165858_29429 user: svc_di_data_eng queue: core-data > #user-pending-applications: -3 #user-active-applications: 7 > #queue-pending-applications: 0 #queue-active-applications: 21 > > So it appears RM failed to removed the application from the set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight
[ https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-10428: - Comment: was deleted (was: I wonder whether {code:java} // code placeholder {code} {code:java} if (sizeBasedWeight && mag != 0) {code}) > Zombie applications in the YARN queue using FAIR + sizebasedweight > -- > > Key: YARN-10428 > URL: https://issues.apache.org/jira/browse/YARN-10428 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.8.5 >Reporter: Guang Yang >Priority: Major > Attachments: YARN-10428.001.patch, YARN-10428.002.patch > > > Seeing zombie jobs in the YARN queue that uses FAIR and size based weight > ordering policy . > *Detection:* > The YARN UI shows incorrect number of "Num Schedulable Applications". > *Impact:* > The queue has an upper limit of number of running applications, with zombie > job, it hits the limit even though the number of running applications is far > less than the limit. > *Workaround:* > **Fail-over and restart Resource Manager process. > *Analysis:* > **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy# > schedulableEntities` (see attachment). Take application > "application_1599157165858_29429" for example, it is still in the > `FairOderingPolicy#schedulableEntities` set, however, if we check the log of > resource manager, we can see RM already tried to remove the application: > > ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 > 04:32:19,730 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue > (ResourceManager Event Processor): Application removed - appId: > application_1599157165858_29429 user: svc_di_data_eng queue: core-data > #user-pending-applications: -3 #user-active-applications: 7 > #queue-pending-applications: 0 #queue-active-applications: 21 > > So it appears RM failed to removed the application from the set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634202#comment-14634202 ] Dongwook Kwon commented on YARN-3929: - The reason is we already had similar tool as log-aggregator out of hadoop, not only for YARN, it was designed for Hadoop 1 which didn't have native log-aggregation feature, in our cluster, each node has daemon that periodically checks application log in local and push to S3, it works fine even with 2000 nodes, the issue we have now is with YARN's log-aggregation, as you can imagine, 2 systems tries to do the same things, and other internal users want to use YARN's log-aggregation for such as HUE or yarn logs --applicationId command, and we still need to support Hadoop 1, so whenever cluster turns on YARN's log-aggregation, we don't have application log for troubleshooting. This has been an issue for long and simple solution for our team is making this optional as I suggested, I agree, for most of use cases, it may not be useful, so I make default as cleaning up and make sure test catch it. Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.02.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3929: Attachment: YARN-3929.02.patch Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.02.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3929: Attachment: (was: YARN-3929.01.patch) Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.02.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632535#comment-14632535 ] Dongwook Kwon commented on YARN-3929: - Thanks Xuan for the information. I quickly looked yarn.nodemanager.delete.debug-delay-sec and tested, it appears the setting affect on DeletionService which means it will delay or not to delete all local files which are supposed to be deleted by DeletionService? I do want to keep application log for my own backup/troubleshooting but not for other files for such as application's localization, usercache, filecache, nmPrivate, spilled files etc, I would like to delete these as quick cycle as possible. Please correct me if I was misunderstood about yarn.nodemanager.delete.debug-delay-sec I couldn't find exact what I want, If there is any option that I can keep only application log in local with log-aggregation feature, I would just use it and close this case. Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.01.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
Dongwook Kwon created YARN-3929: --- Summary: Uncleaning option for local app log files with log-aggregation feature Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.6.0, 2.4.0 Reporter: Dongwook Kwon Priority: Minor Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature
[ https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3929: Attachment: YARN-3929.01.patch Could you review this patch, Thanks. Uncleaning option for local app log files with log-aggregation feature -- Key: YARN-3929 URL: https://issues.apache.org/jira/browse/YARN-3929 Project: Hadoop YARN Issue Type: New Feature Components: log-aggregation Affects Versions: 2.4.0, 2.6.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3929.01.patch Although it makes sense to delete local app log files once AppLogAggregator copied all files into remote location(HDFS), I have some use cases that need to leave local app log files after it's copied to HDFS. Mostly it's for own backup purpose. I would like to use log-aggregation feature of YARN and want to back up app log files too. Without this option, files has to copy from HDFS to local again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon updated YARN-3843: Attachment: YARN-3843.01.patch Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
Dongwook Kwon created YARN-3843: --- Summary: Fair Scheduler should not accept apps with space keys as queue name Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.5.0, 2.4.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596842#comment-14596842 ] Dongwook Kwon commented on YARN-3843: - From my investigation, QueueMetrics doesn't allow space key string as start or end of names, it just trims empty strings. static final Splitter Q_SPLITTER = Splitter.on('.').omitEmptyStrings().trimResults(); https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L112 https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L85 So, from FairScheduler, root.adhoc.birvine , this queue name with the space at the end of name, it is treated as different from root.adhoc.birvine because it has one more character, and from QueueMetrics, because names are trimmed, all of sudden, 2 different queue names become the same that causes the error as Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596858#comment-14596858 ] Dongwook Kwon commented on YARN-3843: - Thanks, you're right, it's duplicated. I didn't find the other jira case, I will close it. Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name
[ https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongwook Kwon resolved YARN-3843. - Resolution: Duplicate Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Fair Scheduler should not accept apps with space keys as queue name --- Key: YARN-3843 URL: https://issues.apache.org/jira/browse/YARN-3843 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0, 2.5.0 Reporter: Dongwook Kwon Priority: Minor Fix For: 2.8.0 Attachments: YARN-3843.01.patch As YARN-461, since empty string queue name is not valid, queue name with space keys such as ,should not be accepted either, also not as prefix nor postfix. e.g) root.test.queuename , or root.test. queuename I have 2 specific cases kill RM with these space keys as part of queue name. 1) Without placement policy (hadoop 2.4.0 and above), When a job is submitted with (space key) as queue name e.g) mapreduce.job.queuename= 2) With placement policy (hadoop 2.5.0 and above) Once a job is submitted without space key as queue name, and submit another job with space key. e.g) 1st time: mapreduce.job.queuename=root.test.user1 2nd time: mapreduce.job.queuename=root.test.user1 {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.724 sec ERROR! org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists! at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)