Hi,
> It seems that there are only ever 10-20 tasks running at a time however our > YARN RM reports < 10% utilization so we know cluster resources are not the > issue. Is there a way to “trick” Tez into scheduling more tasks concurrently? ... > We are running simple queries so it may be that tasks are simply finishing > too fast but, for the scale of tasks we have, we expect more than 10-20 > running at the same time. Any help would be appreciated. I have seen something like this in the past, but it needs a few "special" circumstances which are not common in a standard Hadoop cluster. The Tez + YARN interaction is heavily driven by locality (as in Tez will always ask for a mapper with locality to YARN and YARN will try to satisfy it heavily) and that absolutely made sense when you consider HDFS was always co-located with YARN. However, that doesn't work as the architectures evolve. https://issues.apache.org/jira/browse/TEZ-3291 is a more trivial scenario of the problem, though that might be a very specific Azure example (i.e "ignore localhost, it is bogus"). But for another filesystem, the problem was a bit more intractable as it would provide IP addresses for the locality, but those IP addresses belong to a BSD service appliance which will never run YARN. In that scenario, the following YARN ticket comes into play https://issues.apache.org/jira/browse/YARN-4189 Wangda has a better deep-dive of that problem on his blog https://wangda.live/2017/08/23/deep-understand-locality-in-capacityscheduler-and-how-to-control-it/ Short version is that if you provide locality information in your split, but don't run a NodeManager on that IP, YARN effectively throttles containers and swipes left for 40 heartbeats before taking a rack-local. The basic config to start tweaking is "yarn.scheduler.capacity.node-locality-delay" and then turn off the additional rack-local delay (or pretend to be 1 rack). If you want to go spelunking into the Hadoop core, here's the place to start https://github.com/apache/hadoop/blob/8598b498bcaf4deffa822f871a26635bdf3d9d5c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L324 Cheers, Gopal