[GitHub] spark pull request #20243: [SPARK-23052] Migrate ConsoleSink to data source ...

jose-torres Thu, 11 Jan 2018 19:33:06 -0800

GitHub user jose-torres opened a pull request:

    https://github.com/apache/spark/pull/20243


    [SPARK-23052] Migrate ConsoleSink to data source V2 api.

    ## What changes were proposed in this pull request?
    
    Migrate ConsoleSink to data source V2 api.
    
    Note that this includes a missing piece in DataStreamWriter required to 
specify a data source V2 writer.
    
    Note also that I've removed the "Rerun batch" part of the sink, because as 
far as I can tell this would never have actually happened. A 
MicroBatchExecution object will only commit each batch once for its lifetime, 
and a new MicroBatchExecution object would have a new ConsoleSink object which 
doesn't know it's retrying a batch. So I think this represents an anti-feature 
rather than a weakness in the V2 API.
    
    ## How was this patch tested?
    
    new unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jose-torres/spark console-sink

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20243
    
----
commit 3abe75c8db6c584e128dad7c45c10ac8a15af979
Author: Jose Torres <jose@...>
Date:   2018-01-10T23:19:28Z

    basic writer

commit 71cc6e41cc19af8e672c67624ca16f330804ccc8
Author: Jose Torres <jose@...>
Date:   2018-01-12T03:27:42Z

    add test

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20243: [SPARK-23052] Migrate ConsoleSink to data source ...

Reply via email to