[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight

2020-09-18 Thread Dongwook Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198511#comment-17198511
 ] 

Dongwook Kwon commented on YARN-10428:
--

Thanks Guang Yang for the answer.

>> Hi Wenning Ding: as far as I know, mag is supposed to be non-negative unless 
>> there's a bug.

if so, doesn't the change like this better in order to clarify the intention?
{code:java}
double mag = 
r.getSchedulingResourceUsage().getCachedUsed(CommonNodeLabelsManager.ANY).getMemorySize();
if (sizeBasedWeight && mag > 0) {
 double weight = Math.log1p(r.getSchedulingResourceUsage().getCachedDemand(
 CommonNodeLabelsManager.ANY).getMemorySize()) / Math.log(2);
 mag = mag / weight;
}
return Math.max(mag, 0);{code}
 

Also current change will work only with the assumption of getCachedUsed and 
getCachedDemand both return zero at the same time, otherwise,  if getCachedUsed 
returns non-zero but getCachedDemand returns zero, mag will be NaN.

Since Math.log1p(0) / Math.log(2) is zero, mag will be divided by zero in this 
case.

I think it's better to check these precondition in this method and return value 
accordingly than just assume.

> Zombie applications in the YARN queue using FAIR + sizebasedweight
> --
>
> Key: YARN-10428
> URL: https://issues.apache.org/jira/browse/YARN-10428
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.5
>Reporter: Guang Yang
>Priority: Major
> Attachments: YARN-10428.001.patch, YARN-10428.002.patch
>
>
> Seeing zombie jobs in the YARN queue that uses FAIR and size based weight 
> ordering policy .
> *Detection:*
> The YARN UI shows incorrect number of "Num Schedulable Applications".
> *Impact:*
> The queue has an upper limit of number of running applications, with zombie 
> job, it hits the limit even though the number of running applications is far 
> less than the limit. 
> *Workaround:*
> **Fail-over and restart Resource Manager process.
> *Analysis:*
> **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy#
> schedulableEntities` (see attachment). Take application 
> "application_1599157165858_29429" for example, it is still in the  
> `FairOderingPolicy#schedulableEntities` set, however, if we check the log of 
> resource manager, we can see RM already tried to remove the application:
>  
> ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 
> 04:32:19,730 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Application removed - appId: 
> application_1599157165858_29429 user: svc_di_data_eng queue: core-data 
> #user-pending-applications: -3 #user-active-applications: 7 
> #queue-pending-applications: 0 #queue-active-applications: 21
>  
> So it appears RM failed to removed the application from the set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight

2020-09-18 Thread Dongwook Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198478#comment-17198478
 ] 

Dongwook Kwon commented on YARN-10428:
--

I wonder whether
{code:java}
// code placeholder
{code}
{code:java}
if (sizeBasedWeight && mag != 0) {code}

> Zombie applications in the YARN queue using FAIR + sizebasedweight
> --
>
> Key: YARN-10428
> URL: https://issues.apache.org/jira/browse/YARN-10428
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.5
>Reporter: Guang Yang
>Priority: Major
> Attachments: YARN-10428.001.patch, YARN-10428.002.patch
>
>
> Seeing zombie jobs in the YARN queue that uses FAIR and size based weight 
> ordering policy .
> *Detection:*
> The YARN UI shows incorrect number of "Num Schedulable Applications".
> *Impact:*
> The queue has an upper limit of number of running applications, with zombie 
> job, it hits the limit even though the number of running applications is far 
> less than the limit. 
> *Workaround:*
> **Fail-over and restart Resource Manager process.
> *Analysis:*
> **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy#
> schedulableEntities` (see attachment). Take application 
> "application_1599157165858_29429" for example, it is still in the  
> `FairOderingPolicy#schedulableEntities` set, however, if we check the log of 
> resource manager, we can see RM already tried to remove the application:
>  
> ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 
> 04:32:19,730 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Application removed - appId: 
> application_1599157165858_29429 user: svc_di_data_eng queue: core-data 
> #user-pending-applications: -3 #user-active-applications: 7 
> #queue-pending-applications: 0 #queue-active-applications: 21
>  
> So it appears RM failed to removed the application from the set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-10428) Zombie applications in the YARN queue using FAIR + sizebasedweight

2020-09-18 Thread Dongwook Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-10428:
-
Comment: was deleted

(was: I wonder whether
{code:java}
// code placeholder
{code}
{code:java}
if (sizeBasedWeight && mag != 0) {code})

> Zombie applications in the YARN queue using FAIR + sizebasedweight
> --
>
> Key: YARN-10428
> URL: https://issues.apache.org/jira/browse/YARN-10428
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.8.5
>Reporter: Guang Yang
>Priority: Major
> Attachments: YARN-10428.001.patch, YARN-10428.002.patch
>
>
> Seeing zombie jobs in the YARN queue that uses FAIR and size based weight 
> ordering policy .
> *Detection:*
> The YARN UI shows incorrect number of "Num Schedulable Applications".
> *Impact:*
> The queue has an upper limit of number of running applications, with zombie 
> job, it hits the limit even though the number of running applications is far 
> less than the limit. 
> *Workaround:*
> **Fail-over and restart Resource Manager process.
> *Analysis:*
> **In the heap dump, we can find the zombie jobs in the `FairOderingPolicy#
> schedulableEntities` (see attachment). Take application 
> "application_1599157165858_29429" for example, it is still in the  
> `FairOderingPolicy#schedulableEntities` set, however, if we check the log of 
> resource manager, we can see RM already tried to remove the application:
>  
> ./yarn-yarn-resourcemanager-ip-172-21-153-252.log.2020-09-04-04:2020-09-04 
> 04:32:19,730 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Application removed - appId: 
> application_1599157165858_29429 user: svc_di_data_eng queue: core-data 
> #user-pending-applications: -3 #user-active-applications: 7 
> #queue-pending-applications: 0 #queue-active-applications: 21
>  
> So it appears RM failed to removed the application from the set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-20 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634202#comment-14634202
 ] 

Dongwook Kwon commented on YARN-3929:
-

The reason is we already had similar tool as log-aggregator out of hadoop, not 
only for YARN, it was designed for Hadoop 1 which didn't have native 
log-aggregation feature, in our cluster,  each node has daemon that 
periodically checks application log in local and push to S3, it works fine even 
with 2000 nodes, the issue we have now is with YARN's log-aggregation, as you 
can imagine, 2 systems tries to do the same things, and other internal users 
want to use YARN's log-aggregation for such as HUE or yarn logs 
--applicationId command, and we still need to support Hadoop 1, so whenever  
cluster turns on YARN's log-aggregation, we don't have application log for 
troubleshooting. This has been an issue for long and simple solution for our 
team is making this optional as I suggested, I agree, for most of use cases, it 
may not be useful, so I make default as cleaning up and make sure test catch it.

 Uncleaning option for local app log files with log-aggregation feature
 --

 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.4.0, 2.6.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3929.02.patch


 Although it makes sense to delete local app log files once AppLogAggregator 
 copied all files into remote location(HDFS), I have some use cases that need 
 to leave local app log files after it's copied to HDFS. Mostly it's for own 
 backup purpose. I would like to use log-aggregation feature of YARN and want 
 to back up app log files too. Without this option, files has to copy from 
 HDFS to local again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-19 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-3929:

Attachment: YARN-3929.02.patch

 Uncleaning option for local app log files with log-aggregation feature
 --

 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.4.0, 2.6.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3929.02.patch


 Although it makes sense to delete local app log files once AppLogAggregator 
 copied all files into remote location(HDFS), I have some use cases that need 
 to leave local app log files after it's copied to HDFS. Mostly it's for own 
 backup purpose. I would like to use log-aggregation feature of YARN and want 
 to back up app log files too. Without this option, files has to copy from 
 HDFS to local again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-19 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-3929:

Attachment: (was: YARN-3929.01.patch)

 Uncleaning option for local app log files with log-aggregation feature
 --

 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.4.0, 2.6.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3929.02.patch


 Although it makes sense to delete local app log files once AppLogAggregator 
 copied all files into remote location(HDFS), I have some use cases that need 
 to leave local app log files after it's copied to HDFS. Mostly it's for own 
 backup purpose. I would like to use log-aggregation feature of YARN and want 
 to back up app log files too. Without this option, files has to copy from 
 HDFS to local again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-18 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632535#comment-14632535
 ] 

Dongwook Kwon commented on YARN-3929:
-

Thanks Xuan for the information.
I quickly looked yarn.nodemanager.delete.debug-delay-sec and tested, it appears 
the setting affect on DeletionService which means it will delay or not to 
delete all local files which are supposed to be deleted by DeletionService?  I 
do want to keep application log for my own backup/troubleshooting but not for 
other files for such as application's localization, usercache, filecache, 
nmPrivate, spilled files etc, I would like to delete these as quick cycle as 
possible. Please correct me if I was misunderstood about 
yarn.nodemanager.delete.debug-delay-sec
I couldn't find exact what I want, If there is any option that I can keep only 
application log in local with log-aggregation feature, I would just use it and 
close this case.  

 Uncleaning option for local app log files with log-aggregation feature
 --

 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.4.0, 2.6.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3929.01.patch


 Although it makes sense to delete local app log files once AppLogAggregator 
 copied all files into remote location(HDFS), I have some use cases that need 
 to leave local app log files after it's copied to HDFS. Mostly it's for own 
 backup purpose. I would like to use log-aggregation feature of YARN and want 
 to back up app log files too. Without this option, files has to copy from 
 HDFS to local again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-16 Thread Dongwook Kwon (JIRA)
Dongwook Kwon created YARN-3929:
---

 Summary: Uncleaning option for local app log files with 
log-aggregation feature
 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.6.0, 2.4.0
Reporter: Dongwook Kwon
Priority: Minor


Although it makes sense to delete local app log files once AppLogAggregator 
copied all files into remote location(HDFS), I have some use cases that need to 
leave local app log files after it's copied to HDFS. Mostly it's for own backup 
purpose. I would like to use log-aggregation feature of YARN and want to back 
up app log files too. Without this option, files has to copy from HDFS to local 
again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3929) Uncleaning option for local app log files with log-aggregation feature

2015-07-16 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-3929:

Attachment: YARN-3929.01.patch

Could you review this patch, Thanks.

 Uncleaning option for local app log files with log-aggregation feature
 --

 Key: YARN-3929
 URL: https://issues.apache.org/jira/browse/YARN-3929
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: log-aggregation
Affects Versions: 2.4.0, 2.6.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3929.01.patch


 Although it makes sense to delete local app log files once AppLogAggregator 
 copied all files into remote location(HDFS), I have some use cases that need 
 to leave local app log files after it's copied to HDFS. Mostly it's for own 
 backup purpose. I would like to use log-aggregation feature of YARN and want 
 to back up app log files too. Without this option, files has to copy from 
 HDFS to local again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon updated YARN-3843:

Attachment: YARN-3843.01.patch

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)
Dongwook Kwon created YARN-3843:
---

 Summary: Fair Scheduler should not accept apps with space keys as 
queue name
 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0, 2.4.0
Reporter: Dongwook Kwon
Priority: Minor


As YARN-461, since empty string queue name is not valid, queue name with space 
keys such as   ,should not be accepted either, also not as prefix nor 
postfix. 
e.g) root.test.queuename  , or root.test. queuename

I have 2 specific cases kill RM with these space keys as part of queue name.
1) Without placement policy (hadoop 2.4.0 and above), 
When a job is submitted with  (space key) as queue name
e.g) mapreduce.job.queuename= 

2) With placement policy (hadoop 2.5.0 and above)
 Once a job is submitted without space key as queue name, and submit another 
job with space key.
e.g) 1st time: mapreduce.job.queuename=root.test.user1 
2nd time: mapreduce.job.queuename=root.test.user1 

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 0.724 sec   ERROR!
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596842#comment-14596842
 ] 

Dongwook Kwon commented on YARN-3843:
-

From my investigation, QueueMetrics doesn't allow space key string as start or 
end of names, it just trims empty strings.

static final Splitter Q_SPLITTER = 
Splitter.on('.').omitEmptyStrings().trimResults();

https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L112
https://github.com/apache/hadoop/blob/branch-2.5.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java#L85

So, from FairScheduler, root.adhoc.birvine , this queue name with the space 
at the end of name, it is treated as different from root.adhoc.birvine 
because it has one more character, and from QueueMetrics, because names are 
trimmed, all of sudden, 2 different queue names become the same that causes the 
error as Metrics source QueueMetrics,q0=root,q1=adhoc,q2=birvine already 
exists!


 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor

 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596858#comment-14596858
 ] 

Dongwook Kwon commented on YARN-3843:
-

Thanks, you're right, it's duplicated. I didn't find the other jira case, I 
will close it.

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3843) Fair Scheduler should not accept apps with space keys as queue name

2015-06-22 Thread Dongwook Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongwook Kwon resolved YARN-3843.
-
  Resolution: Duplicate
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0

 Fair Scheduler should not accept apps with space keys as queue name
 ---

 Key: YARN-3843
 URL: https://issues.apache.org/jira/browse/YARN-3843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0, 2.5.0
Reporter: Dongwook Kwon
Priority: Minor
 Fix For: 2.8.0

 Attachments: YARN-3843.01.patch


 As YARN-461, since empty string queue name is not valid, queue name with 
 space keys such as   ,should not be accepted either, also not as 
 prefix nor postfix. 
 e.g) root.test.queuename  , or root.test. queuename
 I have 2 specific cases kill RM with these space keys as part of queue name.
 1) Without placement policy (hadoop 2.4.0 and above), 
 When a job is submitted with  (space key) as queue name
 e.g) mapreduce.job.queuename= 
 2) With placement policy (hadoop 2.5.0 and above)
  Once a job is submitted without space key as queue name, and submit another 
 job with space key.
 e.g) 1st time: mapreduce.job.queuename=root.test.user1 
 2nd time: mapreduce.job.queuename=root.test.user1 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.974 sec  
 FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
 testQueueNameWithSpace(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
   Time elapsed: 0.724 sec   ERROR!
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=adhoc,q2=birvine already exists!
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
   at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
   at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:218)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:96)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:56)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:66)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createQueue(QueueManager.java:169)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:88)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:660)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1127)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testQueueNameWithSpace(TestFairScheduler.java:627)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)