GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/15707
[SPARK-18024][SQL] Introduce an internal commit protocol API - rebased
## What changes were proposed in this pull request?
This patch introduces an internal commit protocol API that is used by the
batch data source to do write commits. It currently has only one implementation
that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit
API can be used to unify streaming and batch commits.
## How was this patch tested?
Should be covered by existing write tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rxin/spark SPARK-18024-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15707.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15707
----
commit 8c4ae5eb7441fd5bc0b06276d5d02a2ebc6de4a0
Author: Eric Liang <[email protected]>
Date: 2016-10-27T21:45:52Z
Thu Oct 27 14:45:52 PDT 2016
commit 2484809e1735a7c3fc875f09c68c12d2cd99dd62
Author: Eric Liang <[email protected]>
Date: 2016-10-28T00:53:13Z
Thu Oct 27 17:53:13 PDT 2016
commit 4d967251ce01794f7cdab9f84b70fa5393d1d1f2
Author: Eric Liang <[email protected]>
Date: 2016-10-28T00:53:30Z
Thu Oct 27 17:53:29 PDT 2016
commit 72c4294bb401ff3795363d3c0bb436bb56844630
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T17:56:49Z
WIP - commit API
commit 2a613516dd469bca5ed4d7b0f17f678e9e70e267
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T17:57:18Z
Add commit protocol itself
commit 6af14b56590a0882800f62a2a2b939ee3715edbb
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T20:46:35Z
Move output committer instantiation into MapReduceFileCommitterProtocol.
commit 6166093d511e833587d32e398338e2f47ccbcc8a
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T20:50:13Z
Specify that implementations must be serializable.
commit 040bbba0bdbd647f963b7a61e18b69fd62565201
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T22:16:05Z
Specify path
commit 51d0919577c71155adb7d4737e9441cede8fe97d
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T22:36:46Z
Add documentation.
commit 2d7d373fe48d18037653c10424c8b1c978160958
Author: Reynold Xin <[email protected]>
Date: 2016-10-31T22:43:54Z
Make MapReduceFileCommitterProtocol serializable.
commit cd23d2f7bdf7a3ef9b93e77a3ae540d553398267
Author: Reynold Xin <[email protected]>
Date: 2016-11-01T00:34:31Z
Make protocol configurable.
commit 0647959cbbbaaf5fb5cfe31515c2598f99ee180f
Author: Reynold Xin <[email protected]>
Date: 2016-11-01T00:58:23Z
Merge pull request #15633 from ericl/spark-18087
[SPARK-18087] [SQL] Optimize insert to not require REPAIR TABLE
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]