hiboyang commented on pull request #31715:
URL: https://github.com/apache/spark/pull/31715#issuecomment-790331585
> Then how about a property somewhere close to `MapStatus`? I guess this
config will become sort of mysterious config. As Spark, users also might not
have knowledge about how to set this? In Spark, it should know where the
shuffle output is kept. So ideally Spark should know if the shuffle output
should be unregistered or not. I just don't know if currently Spark provides
necessary stuffs for it.
>
> Under mixed solution, this config is also hard to set properly. Should it
be set to true or false if the shuffle output could be kept either in fallback
storage and executor?
Yeah, I did some further looking following your suggestion. It looks Spark
already checks and matches the executor id when it tries to remove map output,
like following code. And it could already work well with customized shuffle
manager and thrid party remote shuffle service.
```
def removeOutputsOnExecutor(execId: String): Unit = withWriteLock {
logDebug(s"Removing outputs for execId ${execId}")
removeOutputsByFilter(x => x.executorId == execId)
}
```
Turns out I do not need to add this markFileLostOnExecutorLost configure any
more :) Thanks again for your comments!
I will close this PR (after checking other comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]