Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22952 > Provide additional option: delete (two options - 'rename' / 'delete' - are mutually exclusive) > > Actually the actions end users are expected to take are 1. moving to archive directory (with compression or not) 2. delete periodically. If moving/renaming require non-trivial cost, end users may want to just delete files directly without backing up. +1 for this approach. The file listing cost is huge when the directory has a lot of files. I think one of the goals of this feature is reducing the file listing cost. Hence either delete the files or move to a different directory should be fine. Also could you try to make one simple option for `rename/delete`, such as `cleanSource` -> (`none`, `rename` or `delete`)? When the user picks up `rename`, they should be able to set the archive directory using another option. In addition, it would be great that we can document that whenever using this option, the same directory should not be used by multiple queries.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org