GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/16554

    [SPARK-19183] [SQL] Add deleteWithJob hook to internal commit protocol API

    ## What changes were proposed in this pull request?
    
    Currently in SQL we implement overwrites by calling fs.delete() directly on 
the original data. This is not ideal since we the original files end up deleted 
even if the job aborts. We should extend the commit protocol to allow file 
overwrites to be managed as well.
    
    ## How was this patch tested?
    
    Existing tests. I also fixed a bunch of tests that were depending on the 
commit protocol implementation being set to the legacy mapreduce one.
    
    cc @rxin @cloud-fan 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark add-delete-protocol

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16554
    
----
commit 669d36bc71bcdc3e00bad9f416fcf3d9b9103136
Author: Eric Liang <e...@databricks.com>
Date:   2017-01-11T19:52:54Z

    Wed Jan 11 15:05:16 PST 2017

commit d7168e6a98537477ba4c8053de088df17415ab5f
Author: Eric Liang <e...@databricks.com>
Date:   2017-01-11T22:13:58Z

    Pull in changes to add delete api

commit c74aa88e815887947fc1406efbbe868902435228
Author: Eric Liang <e...@databricks.com>
Date:   2017-01-11T23:58:02Z

    fix tests that depend on the protocol config

commit 4e2fd96f5282d2f44c543d58ddddb8aa6c728980
Author: Eric Liang <e...@databricks.com>
Date:   2017-01-12T00:04:59Z

    Wed Jan 11 16:04:59 PST 2017

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to