Hi Bejoy,

Thanks a lot for your help:)  I'm still a little confused.

In my script I always have the
set.auto.convert.join = true.

Then I did what you suggested:

1st -> set hive.optimize.bucketmapjoin=true
2nd ->  set hive.optimize.bucketmapjoin=false

I ran explain with these 2 options and the output is the same. No difference in 
the plan….

Regarding my 2nd comment about hive output in the 2 cases being same or 
different, I was not referring to the actual result which should be the same in 
both cases
but the logs that hive prints. For example when a normal map-side join is 
executed I can see that first a hash table is created, the time it took to 
create and distribute it and so on. I was
asking if the log structure should be the exact same when a bucketed map join 
is executed.

Thanks a lot,
Avrilia


On Jan 19, 2012, at 11:34 AM, [email protected] wrote:

> Corrected a few typos in previous mail
> 
> Hi Avrila
> Hi Avrila
>        AFAIK the bucketed map join is not default in hive and it happens only 
> when the configuration parameter hive.optimize.bucketmapjoin is set to true. 
> You may be getting the same execution plan because 
> hive.optimize.bucketmapjoin is set to true in the hive configuration xml 
> file. To cross confirm the same could you explicitly set this to false
> (set hive.optimize.bucketmapjoin = false;
> ) in your hive session and get the query execution plan from explain command. 
> Please find some pointers in line
> 1. Should I see sth different in the explain extended output if I set and 
> unset the hive.optimize.bucketmapjoin option?
> [Bejoy]Yes, you should be seeing different plans for both.
> Try EXPLAIN your join query after setting this
> set hive.optimize.bucketmapjoin = false;
> 
> 2. Should I see something different in the output of hive while running the 
> query if again I set and unset the hive.optimize.bucketmapjoin?
> [Bejoy] No,Hive output should be the same. What ever is the execution plan 
> for an join, optimally the end result should be same.
> 
> 3. Is it possible that even though I set bucketmapjoin to true, Hive will 
> still perform a normal map-side join for some reason? How can I check if this 
> has actually happened?
> [Bejoy] Hive would perform a plain map side join only if the following 
> parameter is enabled. (default it is disabled)
> set hive.auto.convert.join = true; you need to check this value in your 
> configurations.
> If it is enabled irrespective of the table size hive would always try a map 
> join, it would come to a normal join only after the map join attempt fails.
> AFAIK, if the number of buckets are same or multiples between the two tables 
> involved in a join and if the join is on the same columns that are bucketed, 
> with bucketmapjoin enabled it shouldn't execute a plain mapside join but a 
> bucketed map side join would be triggered.
> 
> Hope it helps!..
> 
> Regards
> Bejoy K S
> From: Bejoy Ks <[email protected]>
> Date: Thu, 19 Jan 2012 09:22:08 -0800 (PST)
> To: [email protected]<[email protected]>
> ReplyTo: [email protected]
> Subject: Re: Question on bucketed map join
> 
> Hi Avrila
>        AFAIK the bucketed map join is not default in hive and it happens only 
> when the values is set to true. It could be because the same value is already 
> set in the hive configuration xml file. To cross confirm the same could you 
> explicitly set this to false 
> (set hive.optimize.bucketmapjoin = false;)and get the query execution plan 
> from explain command. 
> 
> Please some pointers in line
> 
> 1. Should I see sth different in the explain extended output if I set and 
> unset the hive.optimize.bucketmapjoin option?
> [Bejoy] you should be seeing the same
> Try EXPLAIN your join query after setting this
> set hive.optimize.bucketmapjoin = false;
> 
> 2. Should I see something different in the output of hive while running the 
> query if again I set and unset the hive.optimize.bucketmapjoin?
> [Bejoy] No,Hive output should be the same. What ever is the execution plan 
> for an join, optimally the end result should be same. 
> 
> 3. Is it possible that even though I set bucketmapjoin to true, Hive will 
> still perform a normal map-side join for some reason? How can I check if this 
> has actually happened?
> [Bejoy] Hive would perform a plain map side join only if the following 
> parameter is enabled. (default it is disabled)
> set hive.auto.convert.join = true; you need to check this value in your 
> configurations.
> If it is enabled irrespective of the table size hive would always try a map 
> join, it would come to a normal join only after the map join attempt fails.
> AFAIK, if the number of buckets are same or multiples between the two tables 
> involved in a join and if the join is on the same columns that are bucketed, 
> with bucketmapjoin enabled it shouldn't execute a plain mapside join a 
> bucketed map side join would be triggered.
> 
> Hope it helps!..
> 
> Regards
> Bejoy.K.S
> 
> From: Avrilia Floratou <[email protected]>
> To: [email protected] 
> Sent: Thursday, January 19, 2012 9:23 PM
> Subject: Question on bucketed map join
> 
> Hi,
> 
> I have two tables with 8 buckets each on the same key and want to join them.
> I ran "explain extended" and get the plan produced by HIVE which shows that a 
> map-side join is a possible plan.
> 
> I then set in my script the hive.optimize.bucketmapjoin option to true and 
> reran the "explain extended" query. I get the exact same plans as output.
> 
> I ran the query with and without the bucketmapjoin optimization and saw no 
> difference in the running time.
> 
> I have the following questions:
> 
> 1. Should I see sth different in the explain extended output if I set and 
> unset the hive.optimize.bucketmapjoin option?
> 
> 2. Should I see something different in the output of hive while running the 
> query if again I set and unset the hive.optimize.bucketmapjoin?
> 
> 3. Is it possible that even though I set bucketmapjoin to true, Hive will 
> still perform a normal map-side join for some reason? How can I check if this 
> has actually happened?
> 
> Thanks,
> Avrilia 
> 

Reply via email to