I was wondering how does the scheduler assign the ShuffledRDD locations to the reduce tasks? Say that you have 4 reduce tasks, and a number of shuffle blocks across two machines. Is each reduce task responsible for a subset of individual keys or a subset of shuffle blocks?
Umar