[GitHub] tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs

GitBox Wed, 23 Jan 2019 12:08:34 -0800

tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch 
of contiguous partition IDs
URL: https://github.com/apache/spark/pull/19788#issuecomment-456947803
 
 
   So just to make sure I'm following, are you saying reducer tasks 5 to 10 
happen to run on the same executor so its fetching those all at once?  Perhaps 
this is combined with your adaptive scheduling logic to automatically set 
reducer number, so for example originally the map thought it had 20,000 
reducers and wrote the map output files accordingly but the adaptive scheduling 
says you really only need 2,000. In that case each reducer really reads the 
output for 10 reducers the map originally created?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs

Reply via email to