Tez tries to obey them but as you call out, it also depends on YARN. 

Tez follows a simple heuristic. It tries a best effort to do data local 
allocation. After a certain delay expires, it tries to then allow a task to be 
assigned to either a data local or rack local container and then after another 
timeout picks any available container. These fallbacks are configurable ( i.e. 
whether to allow fall backs ) as well as the time delay. There is also some 
additional priority given to already launched containers as compared to new 
allocations from YARN.

Search for FALLBACK in TezConfiguration.java or check the attachments in 
https://issues.apache.org/jira/browse/TEZ-2294 for documentation. 

— Hitesh 


On Sep 11, 2015, at 12:05 AM, Raajay <raaja...@gmail.com> wrote:

> I was able to get it working with "hostnames". thanks!
> 
> To dig deeper, how much does Tez obey the hints provided? How are Vertex 
> Location Hints handled ? What if YARN is not able to provide containers in 
> requested locations ?
> 
> Raajay
> 
> On Thu, Sep 10, 2015 at 10:19 AM, Hitesh Shah <hit...@apache.org> wrote:
> In almost all cases, this is usually hostnames. The general flow is find the 
> block locations for the data source, extract the hostname from there and 
> provide it to YARN so that it can provide a container on the same host as the 
> datanode having the data. As long as YARN is using hostnames, the container 
> locality matching should work correctly. I will need to go and check the YARN 
> codebase to see if it does some additional reverse dns lookups for IPs to 
> also function correctly but to be safe, hostnames should work.
> 
> I don’t believe Tez has yet introduced support for working with 
> application-level YARN node labels.
> 
> thanks
> — Hitesh
> 
> On Sep 10, 2015, at 12:43 AM, Raajay <raaja...@gmail.com> wrote:
> 
> > While creating TaskLocationHints, using the static function
> >
> > TaskLocationHint.createTaskLocationHint(Set<String> nodes, Set<string> 
> > racks)
> >
> > what should the Strings be ? IP address of the nodes ? Node labels ? Or 
> > hostnames ?
> >
> > Thanks
> > Raajay
> 
> 

Reply via email to