Re: Low performance map join when join key types are different

2015-12-23 Thread Zhiwen Sun
Got it. Thanks for your reply. Zhiwen Sun On Wed, Dec 23, 2015 at 2:24 PM, Gopal Vijayaraghavan wrote: > > > But why disable mapjoin has better performance when we don't use cast to > >string(user always lazy)? > > > > Join key values comparison in in reduce stage is more quickly? > > The Ha

Re: Low performance map join when join key types are different

2015-12-22 Thread Gopal Vijayaraghavan
> But why disable mapjoin has better performance when we don't use cast to >string(user always lazy)? > > Join key values comparison in in reduce stage is more quickly? The HashMap is slower than the full-sort + sorted-merge-join. It shouldn't be, but it hits the worst-case performance for th

Re: Low performance map join when join key types are different

2015-12-22 Thread Zhiwen Sun
Thanks to Gopal. But why disable mapjoin has better performance when we don't use cast to string(user always lazy)? Join key values comparison in in reduce stage is more quickly? Zhiwen Sun p;9456 On Wed, Dec 23, 2015 at 2:36 AM, Gopal Vijayaraghavan wrote: > > > We found that when we join

Re: Low performance map join when join key types are different

2015-12-22 Thread Gopal Vijayaraghavan
> We found that when we join on two different type keys , hive will >convert all join key to Double. This is because of type coercions for BaseCompare, so that String:Integer comparisons with "<=" will work similarly to "=". > b.id to double. When the conversion occurs, map join will become very

Low performance map join when join key types are different

2015-12-22 Thread Zhiwen Sun
Hi all: We found that when we join on two different type keys , hive will convert all join key to Double. Consider such simple query: explain > select * > from table_a a > join table_b b on a.id = b.id > If type of a.id is int while b.id 's type is string, hive will convert a.id and b.id to dou