GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/21558
[SPARK-24552][SQL] Use task ID instead of attempt number for v2 writes.
## What changes were proposed in this pull request?
This passes the unique task attempt id instead of attempt number to v2 data
sources because attempt number is reused when stages are retried. When attempt
numbers are reused, sources that track data by partition id and attempt number
may incorrectly clean up data because **the same attempt number can be both
committed and aborted**.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark SPARK-24552-v2-source-work-around
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21558.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21558
----
commit e9e776a097f5dca1dccdd6e50b3790e6a91873d8
Author: Ryan Blue <blue@...>
Date: 2018-06-13T19:50:00Z
SPARK-24552: Use task ID instead of attempt number for v2 writes.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]