[GitHub] spark pull request #21615: [SPARK-24552][core][sql] Use unique id instead of...

vanzin Fri, 22 Jun 2018 13:07:34 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/21615


    [SPARK-24552][core][sql] Use unique id instead of attempt number for writes 
[branch-2.3].

    This passes a unique attempt id instead of attempt number to v2
    data sources and hadoop APIs, because attempt number is reused
    when stages are retried. When attempt numbers are reused, sources
    that track data by partition id and attempt number may incorrectly
    clean up data because the same attempt number can be both committed
    and aborted.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-24552-2.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21615
    
----
commit a80b57b54e36677c188a333d23b86de349301001
Author: Marcelo Vanzin <vanzin@...>
Date:   2018-06-22T19:58:16Z

    [SPARK-24552][core][sql] Use unique id instead of attempt number for writes.
    
    This passes a unique attempt id instead of attempt number to v2
    data sources and hadoop APIs, because attempt number is reused
    when stages are retried. When attempt numbers are reused, sources
    that track data by partition id and attempt number may incorrectly
    clean up data because the same attempt number can be both committed
    and aborted.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21615: [SPARK-24552][core][sql] Use unique id instead of...

Reply via email to