bozhang2820 opened a new pull request, #45930:
URL: https://github.com/apache/spark/pull/45930

   ### What changes were proposed in this pull request?
   This change adds a new trait, `ShuffleCleanupMode` under `QueryExecution`, 
and two new configs, `spark.sql.shuffleDependency.skipMigration.enabled` and 
`spark.sql.shuffleDependency.fileCleanup.enabled`.
   
   For Spark Connect query executions, `ShuffleCleanupMode` is controlled by 
the two new configs, and shuffle dependency cleanup are performed accordingly. 
   
   When `spark.sql.shuffleDependency.fileCleanup.enabled` is `true`, shuffle 
dependency files will be cleaned up at the end of query executions. 
   
   When `spark.sql.shuffleDependency.skipMigration.enabled` is `true`, shuffle 
dependencies will be skipped at the shuffle data migration for node 
decommissions.
   
   ### Why are the changes needed?
   This is to: 1. speed up shuffle data migration at decommissions and 2. 
possibly (when file cleanup mode is enabled) release disk space occupied by 
unused shuffle files.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. This change adds two new configs, 
`spark.sql.shuffleDependency.skipMigration.enabled` and 
`spark.sql.shuffleDependency.fileCleanup.enabled` to control the cleanup 
behaviors.
   
   ### How was this patch tested?
   Existing tests.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to