Hi Gopal,

Thanks for the very informative and detailed response. I appreciate all of the 
information and links. I haven’t gotten back around to our testing yet but 
hopefully this information points us in the right direction.

Thanks again,
Zac

> On Nov 6, 2018, at 8:41 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote:
> 
> 
> Hi,
> 
>> It seems that there are only ever 10-20 tasks running at a time however our 
>> YARN RM reports < 10% utilization so we know cluster resources are not the 
>> issue. Is there a way to “trick” Tez into scheduling more tasks concurrently?
> ...
>> We are running simple queries so it may be that tasks are simply finishing 
>> too fast but, for the scale of tasks we have, we expect more than 10-20 
>> running at the same time. Any help would be appreciated.
> 
> I have seen something like this in the past, but it needs a few "special" 
> circumstances which are not common in a standard Hadoop cluster.
> 
> The Tez + YARN interaction is heavily driven by locality (as in Tez will 
> always ask for a mapper with locality to YARN and YARN will try to satisfy it 
> heavily) and that absolutely made sense when you consider HDFS was always 
> co-located with YARN.
> 
> However, that doesn't work as the architectures evolve.
> 
> https://issues.apache.org/jira/browse/TEZ-3291
> 
> is a more trivial scenario of the problem, though that might be a very 
> specific Azure example (i.e "ignore localhost, it is bogus").
> 
> But for another filesystem, the problem was a bit more intractable as it 
> would provide IP addresses for the locality, but those IP addresses belong to 
> a BSD service appliance which will never run YARN.
> 
> In that scenario, the following YARN ticket comes into play
> 
> https://issues.apache.org/jira/browse/YARN-4189
> 
> Wangda has a better deep-dive of that problem on his blog
> 
> https://wangda.live/2017/08/23/deep-understand-locality-in-capacityscheduler-and-how-to-control-it/
> 
> Short version is that if you provide locality information in your split, but 
> don't run a NodeManager on that IP, YARN effectively throttles containers and 
> swipes left for 40 heartbeats before taking a rack-local.
> 
> The basic config to start tweaking is 
> "yarn.scheduler.capacity.node-locality-delay" and then turn off the 
> additional rack-local delay (or pretend to be 1 rack).
> 
> If you want to go spelunking into the Hadoop core, here's the place to start 
> 
> https://github.com/apache/hadoop/blob/8598b498bcaf4deffa822f871a26635bdf3d9d5c/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/allocator/RegularContainerAllocator.java#L324
> 
> Cheers,
> Gopal
> 
> 

Reply via email to