tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-456947803 So just to make sure I'm following, are you saying reducer tasks 5 to 10 happen to run on the same executor so its fetching those all at once? Perhaps this is combined with your adaptive scheduling logic to automatically set reducer number, so for example originally the map thought it had 20,000 reducers and wrote the map output files accordingly but the adaptive scheduling says you really only need 2,000. In that case each reducer really reads the output for 10 reducers the map originally created?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
