Hi, We are migrating our hive queries from Mapreduce to Tez . We are using a query with union all and groupby and same table is read multiple times in the union all subquery. We have noticed a issue with tez here, it runs with kX times more tasks than MR where k is the number of union alls in the query.
When run with Mapreduce, the job is run in one stage consuming *n* mappers and *m* reducers and all *union all* scans are done with the same job. But when it runs with tez, a map vertex is launched for each union all and each vertex has *n* tasks. Hence if there are 50 union alls in a query, the 50n map vertex tasks are launched which is huge. So running this query with tez is occupying so many containers when compared to Mapreduce and we have hit a roadblock for the union queries with tez. Any help in this regard is appreciated. Sample query: http://pastebin.com/u7Rw6Hag Thanks in advance, Ravi