[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Jason Lowe (JIRA) Wed, 14 Sep 2016 08:13:40 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490671#comment-15490671
 ]


Jason Lowe commented on YARN-5545:
----------------------------------

The problem with changing queues to use the max apps conf directly is that it 
becomes more difficult for admins to control the overall memory pressure on the 
RM from pending apps.  IIUC after that change each queue would be able to hold 
up to the system max-apps number of apps.  So each time an admin adds a queue 
it piles on another potential max-apps amount of apps the RM could be tracking 
in total.  Or if the admin increases the max-apps number by X it actually 
increases the total RM app storage by Q*X, where Q is the number of leaf queues.

That's quite different than what happens today and is a significant behavior 
change.  If we go down this route then I think we should have a separate 
top-level config that, when set, specifies the default max-apps per queue 
explicitly rather than having them try to derive it based on relative 
capacities.  We can then debate whether that also overrides the system-wide 
setting or if we still respect the system-wide limit (i.e.: queue may reject an 
app submission not because it hit the queue's max apps limit but because the RM 
hit the system-wide apps limit).  Going with a separate, new config means we 
can preserve backwards compatibility for those who have become accustomed to 
the existing behavior and no surprises when admins use their old configs on the 
new software.

I think max-am-resource-percent is a red herring with respect to the max apps 
discussion.  max-am-resource-percent only controls how many _active_ 
applications there are in a queue, and max apps is controlling the total number 
of apps in the queue.  In fact I wouldn't be surprised if the code doesn't 
check and an admin could configure the RM to allow more active apps than the 
total number of apps the queue is allowed to have at any time.


> App submit failure on queue with label when default queue partition capacity 
> is zero
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-5545
>                 URL: https://issues.apache.org/jira/browse/YARN-5545
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 10000000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>       at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>       at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>       at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>       at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
>       at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
>       ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Reply via email to