[GitHub] [spark] hiboyang commented on pull request #31715: [SPARK-34601][SHUFFLE] Add spark.shuffle.markFileLostOnExecutorLost to not delete shuffle file on executor lost event

GitBox Wed, 03 Mar 2021 22:25:33 -0800


hiboyang commented on pull request #31715:
URL: https://github.com/apache/spark/pull/31715#issuecomment-790331585



   > Then how about a property somewhere close to `MapStatus`? I guess this 
config will become sort of mysterious config. As Spark, users also might not 
have knowledge about how to set this? In Spark, it should know where the 
shuffle output is kept. So ideally Spark should know if the shuffle output 
should be unregistered or not. I just don't know if currently Spark provides 
necessary stuffs for it.
   > 
   > Under mixed solution, this config is also hard to set properly. Should it 
be set to true or false if the shuffle output could be kept either in fallback 
storage and executor?
   
   Yeah, I did some further looking following your suggestion. It looks Spark 
already checks and matches the executor id when it tries to remove map output, 
like following code. And it could already work well with customized shuffle 
manager and thrid party remote shuffle service.
   
   ```
     def removeOutputsOnExecutor(execId: String): Unit = withWriteLock {
       logDebug(s"Removing outputs for execId ${execId}")
       removeOutputsByFilter(x => x.executorId == execId)
     }
   ```
   
   Turns out I do not need to add this markFileLostOnExecutorLost configure any 
more :) Thanks again for your comments!
   
   I will close this PR (after checking other comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] hiboyang commented on pull request #31715: [SPARK-34601][SHUFFLE] Add spark.shuffle.markFileLostOnExecutorLost to not delete shuffle file on executor lost event

Reply via email to