GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/21606
[SPARK-24552][core][SQL] Use task ID instead of attempt number for writes.
This passes the unique task attempt id instead of attempt number to v2 data
sources because attempt number is reused when stages are retried. When attempt
numbers are reused, sources that track data by partition id and attempt number
may incorrectly clean up data because the same attempt number can be both
committed and aborted.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-24552.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21606.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21606
----
commit 6c60d1462c34f01610ada50c989832775b6fd117
Author: Ryan Blue <blue@...>
Date: 2018-06-13T19:50:00Z
SPARK-24552: Use task ID instead of attempt number for v2 writes.
commit 2e6552460eed3013e649b06b16a1d14b1e542e2d
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-21T17:21:00Z
Rename attemptId -> taskId for clarity.
commit 3561723341c3062ba7d8682ea272c549b4bdc245
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-21T17:28:12Z
Use task ID instead of attempt for the Hadoop API too.
commit d5a079d439740f3067722d4e8c9e8e94f292017c
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-21T18:37:54Z
Merge branch 'master' into SPARK-24552.2
commit fdcd39c852e9a2d70da95c37da04190910e7b2f0
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-21T18:51:48Z
Log message update.
commit 7233a5fd7b154e2a1400c5fac11d0356a22f5f98
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-21T18:57:02Z
Javadoc updates.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]