attilapiros commented on a change in pull request #24499: [SPARK-25888][Core]
Serve local disk persisted blocks by the external service after releasing
executor by dynamic allocation
URL: https://github.com/apache/spark/pull/24499#discussion_r280651847
##########
File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
##########
@@ -104,6 +104,15 @@ private[deploy] class Worker(
private val CLEANUP_NON_SHUFFLE_FILES_ENABLED =
conf.get(config.STORAGE_CLEANUP_FILES_AFTER_EXECUTOR_EXIT)
+ val EXTERNAL_SHUFFLE_SERVICE_ENABLED =
conf.get(config.SHUFFLE_SERVICE_ENABLED)
+
+ if (CLEANUP_NON_SHUFFLE_FILES_ENABLED && EXTERNAL_SHUFFLE_SERVICE_ENABLED) {
Review comment:
Good point. I am sure there are other files based on the different BlockIds
we have:
https://github.com/apache/spark/blob/5933ef0723539807356153c767696a1d3cd2b144/core/src/main/scala/org/apache/spark/storage/BlockId.scala#L105-L114
Like temporary blocks. And I am uncertain whether all of these files have a
proper cleanup (like `RemoteBlockDownloadFileManager`). So to be on the safe
side I will create a new cleanup function which deletes non-shuffle and
non-cached files only and in this case I will use this new cleanup method.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]