GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/21616
[SPARK-24552][core] Use unique id instead of attempt number for writes
[branch-2.2].
This passes a unique attempt id to the Hadoop APIs, because attempt
number is reused when stages are retried. When attempt numbers are
reused, sources that track data by partition id and attempt number
may incorrectly clean up data because the same attempt number can
be both committed and aborted.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-24552-2.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21616.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21616
----
commit 88679a0631bb3ddd6707c2f2b81f8886bf837fd8
Author: Marcelo Vanzin <vanzin@...>
Date: 2018-06-22T19:58:16Z
[SPARK-24552][core] Use unique id instead of attempt number for writes
[branch-2.2].
This passes a unique attempt id to the Hadoop APIs, because attempt
number is reused when stages are retried. When attempt numbers are
reused, sources that track data by partition id and attempt number
may incorrectly clean up data because the same attempt number can
be both committed and aborted.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]