cloud-fan commented on issue #24110: [SPARK-25341][Core] Support rolling back a 
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24110#issuecomment-473907515
 
 
   To help other people review this patch, can you add the following 
information in your PR description?
   1. How the current shuffle works regarding multiple write attempts. This can 
happen when we try a map write task, or when retry the entire map stage.
   2. What's the problem we are trying to fix in the current shuffle. (because 
of non-deterministic operations, multiple shuffle write attempts may write 
different data)
   3. What's the new proposal and how it solves the problem.
   
   A side question: can we skip the temp file when writing shuffle files? It 
was introduced at https://github.com/apache/spark/pull/9610 and seems 
unnecessary when the shuffle files have attempt number in the name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to