There are multiple reasons for Tez having different no. of tasks:
- Hive itself will behave differently. With MR, it may be have been
processing data from 2 tables in the same map stage which affects no. of tasks.
For Tez, it may end up processing each table in a separate vertex.
- Tez does some level of grouping of input splits to run a smaller set of
tasks depending on configured min/max size of data processed by a task
- Furthermore, Tez looks at the available cluster capacity to decide how
many tasks to run for a single vertex. For example, if a cluster has capacity
to run only 10 containers at a time, Tez will try to at max 1.7 * 10 tasks (
1.7 is a configurable value ). This holds true as long as the data max size
upper bound is not crossed.
thanks
— Hitesh
On Jul 31, 2014, at 8:19 PM, igotux igotux <[email protected]> wrote:
> Thanks Hitesh. That explains the DAG.
>
> When you said completed vs total tasks for a given vertex, does it mean,
> there was a total of 0/2 + 0/8 = 0/10 ( 10 tasks ) for this tez job.
> Which means, when i ran the same query in hive MR, it launched 16 tasks and
> now it is launching only 10 tasks. Also, can you please explain how the
> number of tasks got reduced here ?
>
> Thanks.
>
>
> On Thu, Jul 31, 2014 at 9:20 PM, Hitesh Shah <[email protected]> wrote:
> Hi
>
> This looks like a 3-vertex DAG. It could be possibly be a linear DAG such as
> Map1 -> Map2 -> Reduce3 or a Join DAG where
> Map1 -> Reduce3 and Map2 -> Reduce3.
>
> If you can get the application logs from YARN ( using bin/yarn logs
> -applicationId application_1404180111945_438880 ), you will be able to get a
> .dot file from the logs which will allow you to
> visualize the DAG using a tool like graphviz.
>
> As for the console output, 0/2 or 0/8 just implies the no. of completed vs
> total tasks for a given vertex.
>
> thanks
> — Hitesh
>
>
> On Jul 31, 2014, at 12:04 AM, igotux igotux <[email protected]> wrote:
>
> > Hello Everyone,
> >
> > Can someone help me explain what are the numbers next to Map 1 / Map 2 and
> > Reducer 3 .
> >
> > ~~~~~~~~~~~~~~~
> > Status: Running (application id: application_1404180111945_438880)
> >
> > Map 1: -/- Map 2: -/- Reducer 3: 0/1
> > Map 1: 0/2 Map 2: -/- Reducer 3: 0/1
> > Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
> > Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
> > Map 1: 0/2 Map 2: 0/8 Reducer 3: 0/1
> > Map 1: 1/2 Map 2: 0/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 0/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 2/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 3/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 4/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 6/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 8/8 Reducer 3: 0/1
> > Map 1: 2/2 Map 2: 8/8 Reducer 3: 1/1
> > Status: Finished successfully
> > OK
> > ~~~~~~~~~~~~~~~
> >
> > The MR hive job runs with 16 mappers and one reducer.
>
>