GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/16554
[SPARK-19183] [SQL] Add deleteWithJob hook to internal commit protocol API ## What changes were proposed in this pull request? Currently in SQL we implement overwrites by calling fs.delete() directly on the original data. This is not ideal since we the original files end up deleted even if the job aborts. We should extend the commit protocol to allow file overwrites to be managed as well. ## How was this patch tested? Existing tests. I also fixed a bunch of tests that were depending on the commit protocol implementation being set to the legacy mapreduce one. cc @rxin @cloud-fan You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark add-delete-protocol Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16554 ---- commit 669d36bc71bcdc3e00bad9f416fcf3d9b9103136 Author: Eric Liang <e...@databricks.com> Date: 2017-01-11T19:52:54Z Wed Jan 11 15:05:16 PST 2017 commit d7168e6a98537477ba4c8053de088df17415ab5f Author: Eric Liang <e...@databricks.com> Date: 2017-01-11T22:13:58Z Pull in changes to add delete api commit c74aa88e815887947fc1406efbbe868902435228 Author: Eric Liang <e...@databricks.com> Date: 2017-01-11T23:58:02Z fix tests that depend on the protocol config commit 4e2fd96f5282d2f44c543d58ddddb8aa6c728980 Author: Eric Liang <e...@databricks.com> Date: 2017-01-12T00:04:59Z Wed Jan 11 16:04:59 PST 2017 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org