Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/22952
> Provide additional option: delete (two options - 'rename' / 'delete' -
are mutually exclusive)
>
> Actually the actions end users are expected to take are 1. moving to
archive directory (with compression or not) 2. delete periodically. If
moving/renaming require non-trivial cost, end users may want to just delete
files directly without backing up.
+1 for this approach. The file listing cost is huge when the directory has
a lot of files. I think one of the goals of this feature is reducing the file
listing cost. Hence either delete the files or move to a different directory
should be fine. Also could you try to make one simple option for
`rename/delete`, such as `cleanSource` -> (`none`, `rename` or `delete`)? When
the user picks up `rename`, they should be able to set the archive directory
using another option.
In addition, it would be great that we can document that whenever using
this option, the same directory should not be used by multiple queries.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]