[
https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058132#comment-14058132
]
Jason Lowe commented on YARN-2263:
--
1 is an appropriate lower bound since we don't ever want the maximum number of
applications for a user to be zero or less. (That would be a worthless queue
since we could submit jobs to it but no jobs would activate.)
I'm assuming it only causes a deadlock in the case where the active job submits
and waits for the completion of other jobs? If it simply submits jobs and
exits then even if the queue is so tiny that only 1 active job per user is
allowed then the jobs should eventually complete (assuming sufficient resources
to launch an AM _and_ at least one task simultaneously if this is MapReduce).
If the concern is that the queue can be too small to allow running more than
one application simultaneously for a user and some app frameworks might not
like that, then yes that could be an issue. However I'm not sure that is
YARN's problem to solve. I could have an application framework that for
whatever reason requires 10 jobs to be running simultaneously to work. There
could definitely be a queue config that will not allow that to run properly
because the queue is too small to support 10 simultaneous applications by a
single user. Should YARN handle this scenario? If so, how would it detect it,
and what should it do to mitigate it? I would argue the same applies to the
simpler job-launching-job-and-waiting scenario. Some queues are going to be
too small to support that.
Users can work around issues like this with smarter queue setups. This is
touched upon in MAPREDUCE-4304 and elsewhere for the Oozie case which is a
similar scenario. We can setup a separate queue for the launcher jobs separate
from a queue where the other jobs run. That way we can't accidentally fill the
cluster/queue with just launcher jobs and deadlock.
CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for
nested MapReduce jobs
-
Key: YARN-2263
URL: https://issues.apache.org/jira/browse/YARN-2263
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 0.23.10, 2.4.1
Reporter: Chen He
computeMaxActiveApplicationsPerUser() has a lower bound 1. For a nested
MapReduce job which files new mapreduce jobs in its mapper/reducer, it will
cause job stuck.
public static int computeMaxActiveApplicationsPerUser(
int maxActiveApplications, int userLimit, float userLimitFactor) {
return Math.max(
(int)Math.ceil(
maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),
1);
}
--
This message was sent by Atlassian JIRA
(v6.2#6252)