Fw: Cartesian Product in HIVE

Bejoy KS Sun, 30 Sep 2012 21:31:05 -0700

Hi Abshiek

Both your tables are ideal candidates for map join.


Can you try a plain join statement without setting any properties other than 
num reducers and a map join as the next step.

hive> set mapred.reduce.tasks=5;
hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;

Once this goes well try doing map side join.
hive> set auto.convert.join=true;
hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;

------Original Message------
From: Abhishek
To: user@hive.apache.org
Cc: user@hive.apache.org
Cc: Bejoy Ks
Subject: Re: Cartesian Product in HIVE
Sent: Oct 1, 2012 09:32

Thanks for the reply Bejoy. I did not any order by in the query. Here are the 
properities I have used and query, table sizes -----  set 
mapred.reduce.tasks=17; set mapred.child.java.opts=xmx2073741824; set 
io.sort.mb=512; set io.sort.factor=250; set mapred.reduce.parallel.copies=true; 
set mapred.job.reuse.jvm.num.tasks=1; set 
hive.mapred.reduce.tasks.speculative.execution=false; set 
hive.mapred.map.tasks.speculative.execution=false; CREATE TABLE t1 AS SELECT 
/*+ STREAMTABLE(t2) */ t2.col1, t3.col1 FROM table2 t2 JOIN table3 t3 table2 : 
997406 rows total bytes: 20848934 -- 19.88 mb table3 : 20773 rows total bytes: 
353127 -- 0.33 mb #of Mappers: 4 #of reducers: 1 Regards Abhi On Sep 30, 2012, 
at 9:35 AM, Bejoy KS <bejo...@outlook.com> wrote: Hi Abshiek No need of any 
similar columns for map join to work. It is just taking the join process to 
mapper rather then  doing the same in a reducer. The actual bottle neck is the 
single reducer. Need to figure out why only one reducer is fired rather than 
the set value of 17. Are you using ORDER BY in your query? If so, it sets the 
number of reducers to 1. Can you provide the full console stack here so that 
we'll be able to understand your issue and help you better? (starting from the 
properties you set, your query and the error ). Also can you get the exact data 
sizes for two tables. Regards Bejoy KS > From: abhishek.dod...@gmail.com > 
Date: Sat, 29 Sep 2012 07:44:06 -0700 > Subject: Re: Cartesian Product in HIVE 
> To: user@hive.apache.org; bejoy...@yahoo.com > > Thanks for the reply Bejoy. 
> > I tried to map join, by setting the property mentioned by you and Even > 
increased the small table file size > 20k table size would be not more than 200 
mb but it doesnot work. > > Cartesian product of tables, they dont have any 
similar columns does > map join work here?? > > By applying below setting with 
STREAM TABLE HINT it was processing > around 5 Billion rows per hour,so process 
took around 4 hrs. > > Set io.sort.mb=512 > Set mapred.reduce.tasks=17 > Set 
io.sort.factor=256 > Set
Regards
Bejoy KS

Send from handheld, please excuse typos.

Fw: Cartesian Product in HIVE

Reply via email to