[
https://issues.apache.org/jira/browse/YARN-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606182#comment-15606182
]
Sunil G edited comment on YARN-5545 at 10/26/16 2:58 AM:
---------------------------------------------------------
Extremely for the comment. I mistyped in a wrong Jira. Pls discard below comment
.....
Currently we are trying to invoke activateApplications while recovering each
application. Yes, as of now nodes are getting registered later in the flow. But
for scheduler, we need not have to consider such timing cases from
RMAppManager/RM end. Being said that, its important to separate 2 issues out
here
......
was (Author: sunilg):
Currently we are trying to invoke {{activateApplications}} while recovering
each application. Yes, as of now nodes are getting registered later in the
flow. But for scheduler, we need not have to consider such timing cases from
RMAppManager/RM end. Being said that, its important to separate 2 issues out
here
- Recovery call flow for each app in Scheduler should not invoke
{{activateApplications}} every time
- {{activateApplications}} itself could be improved by considering AM head
room. But that could be done in another ticket, as this one is focusing on
fixing recovery call flow.
To address issue 1, we could only invoke {{activateApplications}} once after
recovering all apps. By this, we can remove the timing dependency from RM end
for recovery. With this change, even if there is a change in RM recovery model,
scheduler would have done its complete recovery flow w/o causing any
performance issue or waiting for resourceTrackerService to register nodes.
Thanks [~leftnoteasy].
Thoughts?
> App submit failure on queue with label when default queue partition capacity
> is zero
> ------------------------------------------------------------------------------------
>
> Key: YARN-5545
> URL: https://issues.apache.org/jira/browse/YARN-5545
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Attachments: YARN-5545.0001.patch, YARN-5545.0002.patch,
> YARN-5545.0003.patch, YARN-5545.004.patch, capacity-scheduler.xml
>
>
> Configure capacity scheduler
> yarn.scheduler.capacity.root.default.capacity=0
> yarn.scheduler.capacity.root.queue1.accessible-node-labels.labelx.capacity=50
> yarn.scheduler.capacity.root.default.accessible-node-labels.labelx.capacity=50
> Submit application as below
> ./yarn jar
> ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-alpha2-SNAPSHOT-tests.jar
> sleep -Dmapreduce.job.node-label-expression=labelx
> -Dmapreduce.job.queuename=default -m 1 -r 1 -mt 10000000 -rt 1
> {noformat}
> 2016-08-21 18:21:31,375 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area /tmp/hadoop-yarn/staging/root/.staging/job_1471670113386_0001
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed
> to submit application_1471670113386_0001 to YARN :
> org.apache.hadoop.security.AccessControlException: Queue root.default already
> has 0 applications, cannot accept submission of application:
> application_1471670113386_0001
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:316)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:255)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1790)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
> at org.apache.hadoop.mapreduce.SleepJob.run(SleepJob.java:273)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.mapreduce.SleepJob.main(SleepJob.java:194)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at
> org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:136)
> at
> org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:144)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
> application_1471670113386_0001 to YARN :
> org.apache.hadoop.security.AccessControlException: Queue root.default already
> has 0 applications, cannot accept submission of application:
> application_1471670113386_0001
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:286)
> at
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:296)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
> ... 25 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]