[GitHub] spark pull request #20386: [WIP][SPARK-23202][SQL] Break down DataSourceV2Wr...

gengliangwang Wed, 24 Jan 2018 09:27:37 -0800

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/20386


    [WIP][SPARK-23202][SQL] Break down DataSourceV2Writer.commit into two phase

    ## What changes were proposed in this pull request?
    
    Currently, the api `DataSourceV2Writer#commit(WriterCommitMessage[])` 
commits a 
    
    writing job with a list of commit messages.
    
    It makes sense in some scenarios, e.g. MicroBatchExecution.
    
    However, on receiving commit message, driver can start processing 
messages(e.g. persist messages into files) before all the messages are 
collected.
    
    The proposal is to Break down `DataSourceV2Writer.commit` into two phase:
    
    1. `add(WriterCommitMessage message)`: Handles a commit message produced by 
{@link DataWriter#commit()}.
    2. `commit()`:  Commits the writing job.
    This should make the API more flexible, and more reasonable for 
implementing some datasources.
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark DSV2_Writer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20386
    
----
commit 11711a43eb4a327af30aa3354cf81366616739e4
Author: Wang Gengliang <ltnwgl@...>
Date:   2018-01-24T09:15:38Z

    add api 'add' in DataSourceV2Writer

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20386: [WIP][SPARK-23202][SQL] Break down DataSourceV2Wr...

Reply via email to