Hi Gopal, Thanks a lot for the answers. They were helpful. I have a few more questions regarding this:
1. OOM condition -- I get the following error when I force a map join in hive/tez with low container size and heap size:" java.lang.OutOfMemoryError: Java heap space". I was wondering what is the condition which leads to this error. Is it when, even one of the 16 partitions of hashtable, cannot fit in memory? I tried setting hive.mapjoin.hybridgrace.minnumpartitions to higher values like 50 and 100 (expecting the size of each partition to drop drastically [the join key doesn't have skew in the distribution] ). But I still get the OOM error. What is the cause? 2. Shuffle Hash Join -- I am using hive 2.0.1. What is the way to force this join implementation? Is there any documentation regarding the same? 3. Hash table size: I use "--hiveconf hive.root.logger=INFO,console" for seeing logs. I do not see the hash table size in these logs. I tried using: "--hiveconf hive.tez.exec.print.summary=true". The output of this was the following: METHOD DURATION(ms) parse 977 semanticAnalyze 2,435 TezBuildDag 473 TezSubmitToRunningDag 841 TotalPrepTime 10,451 VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS Map 1 3 0 0 37.11 65,710 1,039 15,000,000 15,000,000 Map 2 65 0 0 498.77 4,163,920 100,613 615,037,902 0 This doesn't seem to have information about the hash table or #items shuffled. Am I missing something ? Thanks, Ross On Mon, Jun 27, 2016 at 9:10 AM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > 1. Is there a way to check the size of the hash table created during map > >side join in Hive/Tez? > > Only from the log files. > > However, you enable hive.tez.exec.print.summary=true; then the hive CLI > will print out the total # of items shuffle from the broadcast edges > feeding the hashtable. > > Not sure if you might have the reduce-side map-join in your builds, but > that is a bit harder to look into. > > > 2. Is the hash table (small table's), created for the entire table or > >only for the selected and join key columns? > > Yup. Project, filter and group-by (in case of semi-joins). > > select a from tab1 where a IN (select b from tab2 ...); > > will inject a "select distinct b" into the plan. > > > 3. The hash table (created in map side join) spills to disk, if it does > >not fit in memory Is there a parameter in hive/tez to specify the > >percentage of the hash file which can spill? > > Not directly. > > hive.mapjoin.hybridgrace.minnumpartitions=16 > > > by default. So 1/16th of your key space will spill, whenever it hits the > spilling conditions - for the small table. > > In general, the Snowflake-model dimension tables are joined by their > primary key, so the key-space corresponds to the row-distribution too. > > For the big table it will spill only a smaller fraction since the > BloomFilter built during hashtable generation is not spilled, so anything > which misses the bloom filter will not spill to disk during the join. > > All that said, the spilling hash-join is much slower than the shuffling > hash-join (new in 2.0), because the grace hash-join drops parts of the > hash out of memory after each iteration & has to rebuild it for each split > processed. > > In terms of total CPU, building a 4 million row hash table in 600 tasks is > slower than building a 7500 row hashtable x 600 times - the hashtable > lookup goes up by LG(N) too. > > Ask me more questions, if you need more info. > > Cheer, > Gopal > > >