Hi Abshiek Both your tables are ideal candidates for map join.
Can you try a plain join statement without setting any properties other than num reducers and a map join as the next step. hive> set mapred.reduce.tasks=5; hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3; Once this goes well try doing map side join. hive> set auto.convert.join=true; hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3; ------Original Message------ From: Abhishek To: user@hive.apache.org Cc: user@hive.apache.org Cc: Bejoy Ks Subject: Re: Cartesian Product in HIVE Sent: Oct 1, 2012 09:32 Thanks for the reply Bejoy. I did not any order by in the query. Here are the properities I have used and query, table sizes ----- set mapred.reduce.tasks=17; set mapred.child.java.opts=xmx2073741824; set io.sort.mb=512; set io.sort.factor=250; set mapred.reduce.parallel.copies=true; set mapred.job.reuse.jvm.num.tasks=1; set hive.mapred.reduce.tasks.speculative.execution=false; set hive.mapred.map.tasks.speculative.execution=false; CREATE TABLE t1 AS SELECT /*+ STREAMTABLE(t2) */ t2.col1, t3.col1 FROM table2 t2 JOIN table3 t3 table2 : 997406 rows total bytes: 20848934 -- 19.88 mb table3 : 20773 rows total bytes: 353127 -- 0.33 mb #of Mappers: 4 #of reducers: 1 Regards Abhi On Sep 30, 2012, at 9:35 AM, Bejoy KS <bejo...@outlook.com> wrote: Hi Abshiek No need of any similar columns for map join to work. It is just taking the join process to mapper rather then doing the same in a reducer. The actual bottle neck is the single reducer. Need to figure out why only one reducer is fired rather than the set value of 17. Are you using ORDER BY in your query? If so, it sets the number of reducers to 1. Can you provide the full console stack here so that we'll be able to understand your issue and help you better? (starting from the properties you set, your query and the error ). Also can you get the exact data sizes for two tables. Regards Bejoy KS > From: abhishek.dod...@gmail.com > Date: Sat, 29 Sep 2012 07:44:06 -0700 > Subject: Re: Cartesian Product in HIVE > To: user@hive.apache.org; bejoy...@yahoo.com > > Thanks for the reply Bejoy. > > I tried to map join, by setting the property mentioned by you and Even > increased the small table file size > 20k table size would be not more than 200 mb but it doesnot work. > > Cartesian product of tables, they dont have any similar columns does > map join work here?? > > By applying below setting with STREAM TABLE HINT it was processing > around 5 Billion rows per hour,so process took around 4 hrs. > > Set io.sort.mb=512 > Set mapred.reduce.tasks=17 > Set io.sort.factor=256 > Set Regards Bejoy KS Send from handheld, please excuse typos.