Hi Bejoy, Thanks a lot for your help:) I'm still a little confused.
In my script I always have the set.auto.convert.join = true. Then I did what you suggested: 1st -> set hive.optimize.bucketmapjoin=true 2nd -> set hive.optimize.bucketmapjoin=false I ran explain with these 2 options and the output is the same. No difference in the plan…. Regarding my 2nd comment about hive output in the 2 cases being same or different, I was not referring to the actual result which should be the same in both cases but the logs that hive prints. For example when a normal map-side join is executed I can see that first a hash table is created, the time it took to create and distribute it and so on. I was asking if the log structure should be the exact same when a bucketed map join is executed. Thanks a lot, Avrilia On Jan 19, 2012, at 11:34 AM, [email protected] wrote: > Corrected a few typos in previous mail > > Hi Avrila > Hi Avrila > AFAIK the bucketed map join is not default in hive and it happens only > when the configuration parameter hive.optimize.bucketmapjoin is set to true. > You may be getting the same execution plan because > hive.optimize.bucketmapjoin is set to true in the hive configuration xml > file. To cross confirm the same could you explicitly set this to false > (set hive.optimize.bucketmapjoin = false; > ) in your hive session and get the query execution plan from explain command. > Please find some pointers in line > 1. Should I see sth different in the explain extended output if I set and > unset the hive.optimize.bucketmapjoin option? > [Bejoy]Yes, you should be seeing different plans for both. > Try EXPLAIN your join query after setting this > set hive.optimize.bucketmapjoin = false; > > 2. Should I see something different in the output of hive while running the > query if again I set and unset the hive.optimize.bucketmapjoin? > [Bejoy] No,Hive output should be the same. What ever is the execution plan > for an join, optimally the end result should be same. > > 3. Is it possible that even though I set bucketmapjoin to true, Hive will > still perform a normal map-side join for some reason? How can I check if this > has actually happened? > [Bejoy] Hive would perform a plain map side join only if the following > parameter is enabled. (default it is disabled) > set hive.auto.convert.join = true; you need to check this value in your > configurations. > If it is enabled irrespective of the table size hive would always try a map > join, it would come to a normal join only after the map join attempt fails. > AFAIK, if the number of buckets are same or multiples between the two tables > involved in a join and if the join is on the same columns that are bucketed, > with bucketmapjoin enabled it shouldn't execute a plain mapside join but a > bucketed map side join would be triggered. > > Hope it helps!.. > > Regards > Bejoy K S > From: Bejoy Ks <[email protected]> > Date: Thu, 19 Jan 2012 09:22:08 -0800 (PST) > To: [email protected]<[email protected]> > ReplyTo: [email protected] > Subject: Re: Question on bucketed map join > > Hi Avrila > AFAIK the bucketed map join is not default in hive and it happens only > when the values is set to true. It could be because the same value is already > set in the hive configuration xml file. To cross confirm the same could you > explicitly set this to false > (set hive.optimize.bucketmapjoin = false;)and get the query execution plan > from explain command. > > Please some pointers in line > > 1. Should I see sth different in the explain extended output if I set and > unset the hive.optimize.bucketmapjoin option? > [Bejoy] you should be seeing the same > Try EXPLAIN your join query after setting this > set hive.optimize.bucketmapjoin = false; > > 2. Should I see something different in the output of hive while running the > query if again I set and unset the hive.optimize.bucketmapjoin? > [Bejoy] No,Hive output should be the same. What ever is the execution plan > for an join, optimally the end result should be same. > > 3. Is it possible that even though I set bucketmapjoin to true, Hive will > still perform a normal map-side join for some reason? How can I check if this > has actually happened? > [Bejoy] Hive would perform a plain map side join only if the following > parameter is enabled. (default it is disabled) > set hive.auto.convert.join = true; you need to check this value in your > configurations. > If it is enabled irrespective of the table size hive would always try a map > join, it would come to a normal join only after the map join attempt fails. > AFAIK, if the number of buckets are same or multiples between the two tables > involved in a join and if the join is on the same columns that are bucketed, > with bucketmapjoin enabled it shouldn't execute a plain mapside join a > bucketed map side join would be triggered. > > Hope it helps!.. > > Regards > Bejoy.K.S > > From: Avrilia Floratou <[email protected]> > To: [email protected] > Sent: Thursday, January 19, 2012 9:23 PM > Subject: Question on bucketed map join > > Hi, > > I have two tables with 8 buckets each on the same key and want to join them. > I ran "explain extended" and get the plan produced by HIVE which shows that a > map-side join is a possible plan. > > I then set in my script the hive.optimize.bucketmapjoin option to true and > reran the "explain extended" query. I get the exact same plans as output. > > I ran the query with and without the bucketmapjoin optimization and saw no > difference in the running time. > > I have the following questions: > > 1. Should I see sth different in the explain extended output if I set and > unset the hive.optimize.bucketmapjoin option? > > 2. Should I see something different in the output of hive while running the > query if again I set and unset the hive.optimize.bucketmapjoin? > > 3. Is it possible that even though I set bucketmapjoin to true, Hive will > still perform a normal map-side join for some reason? How can I check if this > has actually happened? > > Thanks, > Avrilia >
