[GitHub] [spark] hiboyang commented on pull request #31715: [SPARK-34601][SHUFFLE] Add spark.shuffle.markFileLostOnExecutorLost to not delete shuffle file on executor lost event

GitBox Wed, 03 Mar 2021 16:59:11 -0800


hiboyang commented on pull request #31715:
URL: https://github.com/apache/spark/pull/31715#issuecomment-790199017



   > Users may need to set up this application config differently across 
different solutions, e.g. external shuffle service, built-in shuffle service, 
remote shuffle service, mixed solution, etc. This is somehow low-level Spark 
behavior, and I'm suspicious it is good to expose it to end users and let them 
decide the config. It sounds easy to set a improper value.
   > 
   > Can Spark decide to unregister shuffle output automatically? Like based on 
which shuffle manager is used for shuffle output? or like @attilapiros's idea, 
to have a property somewhere close to MapStatus?
   
   If Spark decides to unregister shuffle output based on which shuffle manager 
is used, that requires Spark has knowledge about different shuffle manager 
implementation. It is hard to implement because user could set any shuffle 
manager implementation by spark.shuffle.manager.
   
   In terms of "Users may need to set up this application config differently 
across different solutions", yes, this is the purpose. There are many shuffle 
solutions as you listed. Current Spark design is pretty good, allowing user to 
set spark.shuffle.manager with customized class to choose different solution. 
However, it assumes shuffle file always lost when executor is lost. This 
assumption conflicts with the customizable shuffle manager design. The new 
config spark.shuffle.markFileLostOnExecutorLost is to keep that assumption by 
default, but gives user the option to choose different solution when needed.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] hiboyang commented on pull request #31715: [SPARK-34601][SHUFFLE] Add spark.shuffle.markFileLostOnExecutorLost to not delete shuffle file on executor lost event

Reply via email to