Data locality of map-side join

Sigurd Spieckermann Mon, 22 Oct 2012 13:29:49 -0700

Hi guys,

I've been trying to figure out whether a map-side join using thejoin-package does anything clever regarding data locality with respectto at least one of the partitions to join. To be more specific, if Iwant to join two datasets and some partition of dataset A is larger thanthe corresponding partition of dataset B, does Hadoop account for thisand try to ensure that the map task is executed on the datanode storingthe bigger partition thus reducing data transfer (if the other partitiondoes not happen to be located on that same datanode)? I couldn'tconclude the one or the other behavior from the source code and Icouldn't find any documentation about this detail.


Thanks for clarifying!
Sigurd

Data locality of map-side join

Reply via email to