Hi,

We are migrating our hive queries from Mapreduce to Tez .
We are using a query with union all and groupby and same table is read
multiple times in the union all subquery.
We have noticed a issue with tez here, it runs with kX times more tasks
than MR where k is the number of union alls in the query.


When run with Mapreduce, the job is run in one stage consuming *n* mappers
and *m* reducers and all *union all* scans are done with the same job.

But when it runs with tez, a map vertex is launched for each union all and
each vertex has *n* tasks.
Hence if there are 50 union alls in a query, the 50n map vertex tasks are
launched which is huge.

So running this query with tez is occupying so many containers when
compared to Mapreduce and we have hit a roadblock for the union queries
with tez.

Any help in this regard is appreciated.


Sample query:
http://pastebin.com/u7Rw6Hag

Thanks in advance,
Ravi

Reply via email to