My hunch is while partitioning is in fact very similar to bucketing (actually superior as you have some control over what file data goes to) the hive optimizer only applies bucket joins if your tables are bucketed so your join condition t1.bucketed_column = t2.bucketed_column triggers the bucketed map join but t1.partitioned_column = t2.partitioned_column doesn't. I'm hoping someone with deeper Hive knowledge would be able to confirm this.
Thank you, Kind Regards ~Maciek On Thu, Jan 29, 2015 at 1:51 PM, murali parimi < muralikrishna.par...@icloud.com> wrote: > I faced the same situation where two tables with 3 billion records on each > side and partitioned, sorted on same key. Set the following parameters in > the hive query assuming the join will happen in the map phase. > > set > hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.enforce.sorting=true; > > I am using hive version 13 and the storage format is Orc. One of the table > is small in size but I haven't checked whether irfan fit in the cache as we > have huge memory. But the map sided join didn't happen. What could be the > reason? > > Sent from my iPhone > > > On Jan 29, 2015, at 7:38 AM, matshyeq <matsh...@gmail.com> wrote: > > > > I do have two tables partitioned on the same criteria. > > Could I still take advantage of Bucket Map Join or better, Sort Merge > Bucket Map Join? > > How? > > > > ~Maciek >