[GitHub] tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs

GitBox Wed, 23 Jan 2019 07:00:32 -0800

tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch 
of contiguous partition IDs
URL: https://github.com/apache/spark/pull/19788#issuecomment-456833428
 
 
   >> If remote server supports merge, it will merge blocks and the returned 
StreamHandle.numChunks < OpenBlocks.blockIds.length. The client will check and 
know merge happens, so it will work accordingly.
   
   So just looking at the description, this implementation is simply having the 
server side read from the separate map output files and send them out in one 
stream when the reducer actually reads, correct?  Meaning you are still getting 
disk seeks on the server side, but on the client side it see's one stream that 
contains the multiple map outputs, correct?
   
   I'm curious what specific performance benefits you were seeing from this?  
Is it just the client side or is there something on the server side that I 
might not be thinking about?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] tgravescs commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs

Reply via email to