Hi Gopal, Thanks for the very informative and detailed response. I appreciate all of the information and links. I haven’t gotten back around to our testing yet but hopefully this information points us in the right direction.
Thanks again, Zac > On Nov 6, 2018, at 8:41 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > > Hi, > >> It seems that there are only ever 10-20 tasks running at a time however our >> YARN RM reports < 10% utilization so we know cluster resources are not the >> issue. Is there a way to “trick” Tez into scheduling more tasks concurrently? > ... >> We are running simple queries so it may be that tasks are simply finishing >> too fast but, for the scale of tasks we have, we expect more than 10-20 >> running at the same time. Any help would be appreciated. > > I have seen something like this in the past, but it needs a few "special" > circumstances which are not common in a standard Hadoop cluster. > > The Tez + YARN interaction is heavily driven by locality (as in Tez will > always ask for a mapper with locality to YARN and YARN will try to satisfy it > heavily) and that absolutely made sense when you consider HDFS was always > co-located with YARN. > > However, that doesn't work as the architectures evolve. > > https://issues.apache.org/jira/browse/TEZ-3291 > > is a more trivial scenario of the problem, though that might be a very > specific Azure example (i.e "ignore localhost, it is bogus"). > > But for another filesystem, the problem was a bit more intractable as it > would provide IP addresses for the locality, but those IP addresses belong to > a BSD service appliance which will never run YARN. > > In that scenario, the following YARN ticket comes into play > > https://issues.apache.org/jira/browse/YARN-4189 > > Wangda has a better deep-dive of that problem on his blog > > https://wangda.live/2017/08/23/deep-understand-locality-in-capacityscheduler-and-how-to-control-it/ > > Short version is that if you provide locality information in your split, but > don't run a NodeManager on that IP, YARN effectively throttles containers and > swipes left for 40 heartbeats before taking a rack-local. > > The basic config to start tweaking is > "yarn.scheduler.capacity.node-locality-delay" and then turn off the > additional rack-local delay (or pretend to be 1 rack). > > If you want to go spelunking into the Hadoop core, here's the place to start > > https://github.com/apache/hadoop/blob/8598b498bcaf4deffa822f871a26635bdf3d9d5c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L324 > > Cheers, > Gopal > >