Re: Cartesian Product in HIVE

Abhishek Mon, 01 Oct 2012 07:56:00 -0700

Thanks for the reply Bejoy.

Sent from my iPhone


On Oct 1, 2012, at 12:30 AM, "Bejoy KS " <bejo...@outlook.com> wrote:

> Hi Abshiek
> 
> Both your tables are ideal candidates for map join.
> 
> Can you try a plain join statement without setting any properties other than 
> num reducers and a map join as the next step.
> 
> hive> set mapred.reduce.tasks=5;
> hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;
> 
-- I tried this but it is still firing only one reducer.

> Once this goes well try doing map side join.
> hive> set auto.convert.join=true;
> hive>SELECT t2.col1,t3.col1FROM table2 t2JOIN table3 t3;

-- This also does not work
It is only showing me as 

map = 0%, reduce = 0%
map = 0%, reduce = 0%
map = 0%, reduce = 0%
map = 0%, reduce = 0%

 For 10 min 

Regards
Abhi
> 
> ------Original Message------
> From: Abhishek
> To: user@hive.apache.org
> Cc: user@hive.apache.org
> Cc: Bejoy Ks
> Subject: Re: Cartesian Product in HIVE
> Sent: Oct 1, 2012 09:32
> 
> Thanks for the reply Bejoy. I did not any order by in the query. Here are the 
> properities I have used and query, table sizes -----  set 
> mapred.reduce.tasks=17; set mapred.child.java.opts=xmx2073741824; set 
> io.sort.mb=512; set io.sort.factor=250; set 
> mapred.reduce.parallel.copies=true; set mapred.job.reuse.jvm.num.tasks=1; set 
> hive.mapred.reduce.tasks.speculative.execution=false; set 
> hive.mapred.map.tasks.speculative.execution=false; CREATE TABLE t1 AS SELECT 
> /*+ STREAMTABLE(t2) */ t2.col1, t3.col1 FROM table2 t2 JOIN table3 t3 table2 
> : 997406 rows total bytes: 20848934 -- 19.88 mb table3 : 20773 rows total 
> bytes: 353127 -- 0.33 mb #of Mappers: 4 #of reducers: 1 Regards Abhi On Sep 
> 30, 2012, at 9:35 AM, Bejoy KS <bejo...@outlook.com> wrote: Hi Abshiek No 
> need of any similar columns for map join to work. It is just taking the join 
> process to mapper rather then  doing the same in a reducer. The actual bottle 
> neck is the single reducer. Need to figure out why only one reducer is fired 
> rather than the set value of 17. Are you using ORDER BY in your query? If so, 
> it sets the number of reducers to 1. Can you provide the full console stack 
> here so that we'll be able to understand your issue and help you better? 
> (starting from the properties you set, your query and the error ). Also can 
> you get the exact data sizes for two tables. Regards Bejoy KS > From: 
> abhishek.dod...@gmail.com > Date: Sat, 29 Sep 2012 07:44:06 -0700 > Subject: 
> Re: Cartesian Product in HIVE > To: user@hive.apache.org; bejoy...@yahoo.com 
> > > Thanks for the reply Bejoy. > > I tried to map join, by setting the 
> property mentioned by you and Even > increased the small table file size > 
> 20k table size would be not more than 200 mb but it doesnot work. > > 
> Cartesian product of tables, they dont have any similar columns does > map 
> join work here?? > > By applying below setting with STREAM TABLE HINT it was 
> processing > around 5 Billion rows per hour,so process took around 4 hrs. > > 
> Set io.sort.mb=512 > Set mapred.reduce.tasks=17 > Set io.sort.factor=256 > Set
> Regards
> Bejoy KS
> 
> Send from handheld, please excuse typos.

Re: Cartesian Product in HIVE

Reply via email to