Hi Mahender, it's hard to say what happen without seeing the actual query.
Hive has several ways to perform joins. There is a complete description of how it does it here: https://cwiki.apache.org/confluence/display/Hive/MapJoinOptimization Sadly, the illustrations are broken. There is also this presentation : https://www.youtube.com/watch?v=OB4H3Yt5VWM And the corresponding slides : https://cwiki.apache.org/confluence/download/attachments/27362054/Hive+Summit+2011-join.pdf However, these docs are from the Map-Reduce era and quite old now. So it is hard to tell if everything works the same way with Tez today. If all your tables are big, I would say there is not much to optimize except trying to bucket and sort them before. Last but not least: When I get this kind of behavior (reducer stuck during a JOIN), more often than not, it is simply because the JOIN clause is incorrect, and the reducer generates way too much data. Just imagine what would happen if you did a "JOIN ON 1 = 1" between two tables with 10^9 records... you can actually kill a cluster with this, if you let it run long enough. On Fri, Dec 9, 2016 at 10:31 PM, Mahender Sarangam < mahender.bigd...@outlook.com> wrote: > Hi, > > We are performing left joining on 5-6 larger tables. We see job is hanging > around 95%. All the mappers completed fast and some of the reducer are also > completed fast. but some of reducer are hanging state because single task > is running on large data. Below are the Mapper and Reducer captured. > > > > - Is there a way to move task running under Reducer phase to Mapper > phase. I mean tweaking with memory settings or modifying the query to have > more mapper tasks than reducer task. > > > - Is there a way to know what part of query is taken by task which is > running for long time. or what amount of rows this task is running upon ( > so that i can think of partition or alternate approach) > - Any other memory setting to resolve hanging issue. Below is our > memory settings > > SET hive.tez.container.size = -1; > SET hive.execution.engine=tez; > SET hive.mapjoin.hybridgrace.hashtable=FALSE; > SET hive.optimize.ppd=true; > SET hive.cbo.enable =true; > SET hive.compute.query.using.stats =true; > SET hive.exec.parallel=true; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.dynamic.partition=true; > SET hive.exec.dynamic.partition.mode=nonstrict; > SET hive.auto.convert.join=false; > SET hive.auto.convert.join.noconditionaltask=false; > set hive.tez.java.opts = "-Xmx3481m"; > set hive.tez.container.size = 4096; > --SET mapreduce.map.memory.mb=4096; > --SET mapreduce.map.java.opts = -Xmx3000M; > --SET mapreduce.reduce.memory.mb = 2048; > --SET mapreduce.reduce.java.opts = -Xmx1630M; > SET fs.block.size=67108864; > > > Thanks in advance > > > -Mahender > > > > >