> But why disable mapjoin has better performance when we don't use cast to
>string(user always lazy)?
> 
> Join key values comparison in  in reduce stage is more quickly?

The HashMap<DoubleWritable, RowContainer> is slower than the full-sort +
sorted-merge-join.


It shouldn't be, but it hits the worst-case performance for the Hashmap
impl because of a bug in DoubleWritable in Hadoop.

The effect is somewhat the same as

public int hashCode() {
   return 1;
}

Read the comments on - https://issues.apache.org/jira/browse/HADOOP-12217

Cheers,
Gopal






Reply via email to