GitHub user jose-torres opened a pull request:
https://github.com/apache/spark/pull/20243
[SPARK-23052] Migrate ConsoleSink to data source V2 api.
## What changes were proposed in this pull request?
Migrate ConsoleSink to data source V2 api.
Note that this includes a missing piece in DataStreamWriter required to
specify a data source V2 writer.
Note also that I've removed the "Rerun batch" part of the sink, because as
far as I can tell this would never have actually happened. A
MicroBatchExecution object will only commit each batch once for its lifetime,
and a new MicroBatchExecution object would have a new ConsoleSink object which
doesn't know it's retrying a batch. So I think this represents an anti-feature
rather than a weakness in the V2 API.
## How was this patch tested?
new unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jose-torres/spark console-sink
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20243.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20243
----
commit 3abe75c8db6c584e128dad7c45c10ac8a15af979
Author: Jose Torres <jose@...>
Date: 2018-01-10T23:19:28Z
basic writer
commit 71cc6e41cc19af8e672c67624ca16f330804ccc8
Author: Jose Torres <jose@...>
Date: 2018-01-12T03:27:42Z
add test
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]