[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Naganarasimha G R (JIRA) Sun, 11 Sep 2016 11:11:03 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482168#comment-15482168
 ]


Naganarasimha G R commented on YARN-5545:
-----------------------------------------

Thanks [~bibinchundatt], for the patch .
Few points to discuss on the approach
# Would it be good to have a separate queue partition based max application 
limit similar to {{yarn.scheduler.capacity.<queue-path>.maximum-applications}} 
so that there is finer control on logical partitions similar to default 
partition ?
# Would it be better to set the default value of 
{{yarn.scheduler.capacity.maximum-applications.accessible-node-labels.<label>}} 
to that of {{yarn.scheduler.capacity.maximum-applications}}, it will make the 
work of the admin much easier. similarly we can decide the same for the 
previous point if we plan to adopt it.
# IIUC you seem to adopt the approach little different than what you mention in 
your 
[comment|https://issues.apache.org/jira/browse/YARN-5545?focusedCommentId=15453163&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15453163],
 though we are having per partition level max app limit, we just sum up max 
limits of all partitions under a queue and check against 
{{ApplicationLimit.getAllMaxApplication()}}. If we were to not actually 
validate against per Queue's PartitionLevelMaxApps then why need to come up 
with a new configuration? Also consider the cases when the accessibility is * 
and new partitions are added {{without refreshing}}, this configuration will be 
wrong as its static.
# Need to take care of documentation which i think is missed for 
*MaximumAMResourcePercentPerPartition* too, May be can be handled in a 
different jira 

> App submit failure on queue with label when default queue partition capacity 
> is zero
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-5545
>                 URL: https://issues.apache.org/jira/browse/YARN-5545
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 10000000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>       at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>       at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>       at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>       at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
>       at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
>       ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Reply via email to