Nathan Roberts created YARN-3309:
------------------------------------
Summary: Capacity scheduler can wait a very long time for node
locality
Key: YARN-3309
URL: https://issues.apache.org/jira/browse/YARN-3309
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Nathan Roberts
The capacity scheduler will delay scheduling a container on a rack-local node
in hopes that a node-local opportunity will come along (YARN-80). It does this
by counting the number of missed scheduling opportunities the application has
had. When the count reaches a certain threshold, the app will accept the
rack-local node. The documented recommendation is to set this threshold to the
#nodes in the cluster.
However, there are some early-out optimizations that can lead to this delay
being a very long time.
Example in allocateContainersToNode():
{code}
// Try to schedule more if there are no reservations to fulfill
if (node.getReservedContainer() == null) {
if (calculator.computeAvailableContainers(node.getAvailableResource(),
minimumAllocation) > 0) {
if (LOG.isDebugEnabled()) {
LOG.debug("Trying to schedule on node: " + node.getNodeName() +
", available: " + node.getAvailableResource());
}
root.assignContainers(clusterResource, node, false);
}
{code}
So, in a large cluster that is completely full (AvailableResource on each node
is 0), SchedulingOpportunities will only increase at the rate of container
completion rate, not the heartbeat rate, which I think was the original
assumption of YARN-80. On a large cluster, this can lead to an hour+ of skipped
scheduling opportunities meaning the fifo'ness of a queue is ignored for a very
long time.
Maybe there should be a time-based limit on this delay as well as a count of
missed-scheduling opportunities.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)