GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/20386
[WIP][SPARK-23202][SQL] Break down DataSourceV2Writer.commit into two phase
## What changes were proposed in this pull request?
Currently, the api `DataSourceV2Writer#commit(WriterCommitMessage[])`
commits a
writing job with a list of commit messages.
It makes sense in some scenarios, e.g. MicroBatchExecution.
However, on receiving commit message, driver can start processing
messages(e.g. persist messages into files) before all the messages are
collected.
The proposal is to Break down `DataSourceV2Writer.commit` into two phase:
1. `add(WriterCommitMessage message)`: Handles a commit message produced by
{@link DataWriter#commit()}.
2. `commit()`: Commits the writing job.
This should make the API more flexible, and more reasonable for
implementing some datasources.
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark DSV2_Writer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20386.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20386
----
commit 11711a43eb4a327af30aa3354cf81366616739e4
Author: Wang Gengliang <ltnwgl@...>
Date: 2018-01-24T09:15:38Z
add api 'add' in DataSourceV2Writer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]