Hello Hitesh,
Thanks for that explanation ! Could you clarify about how locality of input 
splits is used.. 
thanks,Madhu 

    On Thursday, October 27, 2016 11:19 PM, Hitesh Shah <hit...@apache.org> 
wrote:
 

 Hello Madhusudan,

I will start with how container allocations work and make my way back to 
explaining splits. 

At the lowest level, each vertex will have decided to run a number of tasks. At 
a high level, when a task is ready to run, it tells the global DAG scheduler 
about its requirements ( i.e. what kind of container resources it needs, 
additional container specs such as env, local resources, etc. and also where it 
wants to be executed for locality. 

The global scheduler then requests the ResourceManager for as many containers 
as there are pending tasks. When YARN allocates a container to the Tez AM, the 
Tez AM decides which is the highest priority task ( vertices at the top of the 
tree run first ) that matches the container allocated and runs the task on it. 
Re-used containers are given higher priority over new containers due to JVM 
launch costs. And YARN may not give Tez all the containers it requested so Tez 
will make do with whatever it has. It may end up releasing containers which 
don’t match if there are non-matching tasks that need to be run.

Now, let us take a “map” vertex which is reading data from HDFS. In MR, each 
task represented one split ( or a group if you use something like Hive’s 
CombineFileInputFormat ). In Tez, there are a couple of differences: 
  
1) The InputFormat is invoked in the AM i.e. splits are calculated in the AM ( 
can be done on the client but most folks now run those in the AM)
2) Splits are grouped based on the wave configurations ( 
https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works
 ).

Each grouped split will be mapped to one task. This will then define what kind 
of container is requested. 

Let us know if you have more questions.

thanks
— Hitesh


> On Oct 27, 2016, at 5:06 PM, Madhusudan Ramanna <m.rama...@ymail.com> wrote:
> 
> Hello Folks,
> 
> We have a native Tez application.  My question is mainly about MR inputs and 
> tez allocated containers.  How does tez grab containers ? Is it one per input 
> split ?  Could someone shed some light on this ?
> 
> thanks,
> Madhu


   

Reply via email to