[ https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606182#comment-15606182 ]
Sunil G commented on YARN-5545: ------------------------------- Currently we are trying to invoke {{activateApplications}} while recovering each application. Yes, as of now nodes are getting registered later in the flow. But for scheduler, we need not have to consider such timing cases from RMAppManager/RM end. Being said that, its important to separate 2 issues out here - Recovery call flow for each app in Scheduler should not invoke {{activateApplications}} every time - {{activateApplications}} itself could be improved by considering AM head room. But that could be done in another ticket, as this one is focusing on fixing recovery call flow. To address issue 1, we could only invoke {{activateApplications}} once after recovering all apps. By this, we can remove the timing dependency from RM end for recovery. With this change, even if there is a change in RM recovery model, scheduler would have done its complete recovery flow w/o causing any performance issue or waiting for resourceTrackerService to register nodes. Thanks [~leftnoteasy]. Thoughts? > App submit failure on queue with label when default queue partition capacity > is zero > ------------------------------------------------------------------------------------ > > Key: YARN-5545 > URL: https://issues.apache.org/jira/browse/YARN-5545 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch, > YARN-5545.0003.patch, YARN-5545.004.patch, capacity-scheduler.xml > > > Configure capacity scheduler > yarn.scheduler.capacity.root.default.capacity=0 > yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50 > yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50 > Submit application as below > ./yarn jar > ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar > sleep -Dmapreduce.job.node-label-expression=labelx > -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 10000000 -rt 1 > {noformat} > 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001 > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed > to submit application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136) > at > org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit > application_1471670113386_0001 to YARN : > org.apache.hadoop.security.AccessControlException: Queue root.default already > has 0 applications, cannot accept submission of application: > application_1471670113386_0001 > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296) > at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) > ... 25 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org