Re: Data Locality Importance

2014-03-22 Thread Vinod Kumar Vavilapalli
Like you said, it depends both on the kind of network you have and the type of your workload. Given your point about S3, I'd guess your input files/blocks are not large enough that moving code to data trumps moving data itself to the code. When that balance tilts a lot, especially when moving

Re: Data Locality Importance

2014-03-22 Thread Chen He
Hi Mike Data locality has an assumption. It assumes storage access (disk, ssd, etc) is faster than network data transferring. Vinod has already explained the benefits. But locality in map stage may not always bring good things. If a fat node saves a large file, it is possible that current MR