> But why disable mapjoin has better performance when we don't use cast to >string(user always lazy)? > > Join key values comparison in in reduce stage is more quickly?
The HashMap<DoubleWritable, RowContainer> is slower than the full-sort + sorted-merge-join. It shouldn't be, but it hits the worst-case performance for the Hashmap impl because of a bug in DoubleWritable in Hadoop. The effect is somewhat the same as public int hashCode() { return 1; } Read the comments on - https://issues.apache.org/jira/browse/HADOOP-12217 Cheers, Gopal