> 1. Is there a way to check the size of the hash table created during map >side join in Hive/Tez?
Only from the log files. However, you enable hive.tez.exec.print.summary=true; then the hive CLI will print out the total # of items shuffle from the broadcast edges feeding the hashtable. Not sure if you might have the reduce-side map-join in your builds, but that is a bit harder to look into. > 2. Is the hash table (small table's), created for the entire table or >only for the selected and join key columns? Yup. Project, filter and group-by (in case of semi-joins). select a from tab1 where a IN (select b from tab2 ...); will inject a "select distinct b" into the plan. > 3. The hash table (created in map side join) spills to disk, if it does >not fit in memory Is there a parameter in hive/tez to specify the >percentage of the hash file which can spill? Not directly. hive.mapjoin.hybridgrace.minnumpartitions=16 by default. So 1/16th of your key space will spill, whenever it hits the spilling conditions - for the small table. In general, the Snowflake-model dimension tables are joined by their primary key, so the key-space corresponds to the row-distribution too. For the big table it will spill only a smaller fraction since the BloomFilter built during hashtable generation is not spilled, so anything which misses the bloom filter will not spill to disk during the join. All that said, the spilling hash-join is much slower than the shuffling hash-join (new in 2.0), because the grace hash-join drops parts of the hash out of memory after each iteration & has to rebuild it for each split processed. In terms of total CPU, building a 4 million row hash table in 600 tasks is slower than building a 7500 row hashtable x 600 times - the hashtable lookup goes up by LG(N) too. Ask me more questions, if you need more info. Cheer, Gopal