[jira] [Comment Edited] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Sunil G (JIRA) Tue, 25 Oct 2016 20:00:07 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606182#comment-15606182
 ]


Sunil G edited comment on YARN-5545 at 10/26/16 2:58 AM:
---------------------------------------------------------

Extremely for the comment. I mistyped in a wrong Jira. Pls discard below comment

.....
Currently we are trying to invoke activateApplications while recovering each 
application. Yes, as of now nodes are getting registered later in the flow. But 
for scheduler, we need not have to consider such timing cases from 
RMAppManager/RM end. Being said that, its important to separate 2 issues out 
here
......


was (Author: sunilg):
Currently we are trying to invoke {{activateApplications}} while recovering 
each application. Yes, as of now nodes are getting registered later in the 
flow. But for scheduler, we need not have to consider such timing cases from 
RMAppManager/RM end. Being said that, its important to separate 2 issues out 
here
- Recovery call flow for each app in Scheduler should not invoke 
{{activateApplications}} every time
- {{activateApplications}} itself could be improved by considering AM head 
room. But that could be done in another ticket, as this one is focusing on 
fixing recovery call flow.

To address issue 1, we could only invoke {{activateApplications}} once after 
recovering all apps. By this, we can remove the timing dependency from RM end 
for recovery. With this change, even if there is a change in RM recovery model, 
scheduler would have done its complete recovery flow w/o causing any 
performance issue or waiting for resourceTrackerService to register nodes. 
Thanks [~leftnoteasy].

Thoughts?

> App submit failure on queue with label when default queue partition capacity 
> is zero
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-5545
>                 URL: https://issues.apache.org/jira/browse/YARN-5545
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>         Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, 
> YARN-5545.0003.patch, YARN-5545.004.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler 
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar 
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
>  sleep -Dmapreduce.job.node-label-expression=labelx 
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 10000000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed 
> to submit application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
>       at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
>       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>       at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>       at 
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
>       at 
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
> application_1471670113386_0001 to YARN : 
> org.apache.hadoop.security.AccessControlException: Queue root.default already 
> has 0 applications, cannot accept submission of application: 
> application_1471670113386_0001
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
>       at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
>       at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
>       ... 25 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YARN-5545) App submit failure on queue with label when default queue partition capacity is zero

Reply via email to