Answers inline:

On Jul 9, 2014, at 10:02 AM, Grandl Robert <[email protected]> wrote:

> Thanks a lot for your detailed answer. 
> 
> When you say "it is now decided based on a combination of available cluster 
> resources as well as the input data": what do you mean by available cluster 
> resources ? Total resources in the cluster(like number nodes * capability of 
> each node), or instantaneous available resources based on current workload on 
> each such node. 
> 

YARN provides an application a way to access its available headroom/resources. 
Per node resource available is not considered. Just full cluster availability. 
For example, if we know we can run a maximum of 100 containers in parallel but 
the data set consists of 10000 blocks, there is no point running 10000 tasks ( 
assuming each block is quite small and quick to process). Tez will try and fit 
the no. of tasks in a multiple of max containers ( default I believe is 1.7 
waves ) assuming the amount of data processed per task is within the configured 
limits. Too small a task and it is just overhead. Oversized tasks are slower to 
recover hence an upper bound is needed. 

There are obviously edge cases where the above may create problems but in 
general, this has proved more beneficial than the static task count approach 
used in MapReduce.

> So to be clear: even for the same job and same total input size, the number 
> of tasks and such input size per task will differ for consecutive runs ?
> 

Yes. Depending on the cluster load at that point in time, it could change. 
Also, the grouping is not exactly predictable as it depends on what order the 
files' info such as block locations is obtained from HDFS. 

> Also, it seems the number of tasks for a vertex can change at runtime(not 
> about duplicates, but distinct tasks) ?
> 

Yes again. A VertexManager has the capability to modify the parallelism at 
run-time. For example, the auto-reduce functionality can change the parallelism 
of a reduce stage based on extrapolation of the amount of the data being 
generated by tasks of the previous map stage. 

— Hitesh


Reply via email to