Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/22952
  
    > Provide additional option: delete (two options - 'rename' / 'delete' - 
are mutually exclusive)
    > 
    > Actually the actions end users are expected to take are 1. moving to 
archive directory (with compression or not) 2. delete periodically. If 
moving/renaming require non-trivial cost, end users may want to just delete 
files directly without backing up.
    
    +1 for this approach. The file listing cost is huge when the directory has 
a lot of files. I think one of the goals of this feature is reducing the file 
listing cost. Hence either delete the files or move to a different directory 
should be fine. Also could you try to make one simple option for 
`rename/delete`, such as `cleanSource` -> (`none`, `rename` or `delete`)? When 
the user picks up `rename`, they should be able to set the archive directory 
using another option.
    
    In addition, it would be great that we can document that whenever using 
this option, the same directory should not be used by multiple queries.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to