GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/13286
[SPARK-15517][SQL][STREAMING] Add support for complete output mode in
Structure Streaming
## What changes were proposed in this pull request?
Currently structured streaming only supports append output mode. This PR
adds the following.
- Added support for Complete output mode in the internal state store,
analyzer and planner.
- Added public API in Scala and Python for users to specify output mode
- Added checks for unsupported combinations of output mode and DF operations
- Plans with no aggregation should support only Append mode
- Plans with aggregation should support only Update and Complete modes
- Default output mode is Append mode (should we change this to
automatically set to complete mode when there is aggregation?)
- Added support for Complete output mode in Memory Sink. So Memory Sink
supports append and complete, not update.
## How was this patch tested?
Unit tests in various test suites
- StreamingAggregationSuite: tests for complete mode
- MemorySinkSuite: tests for checking behavior in Append and Complete
modes.
- UnsupportedOperationSuite: tests for checking unsupported combinations of
DF ops and output modes
- DataFrameReaderWriterSuite: tests for checking that output mode cannot be
called on static DFs
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark complete-mode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13286.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13286
----
commit 469d69aefea17abbb889a8983a59d83988aaff45
Author: Tathagata Das <[email protected]>
Date: 2016-05-23T23:40:13Z
First commit to support complete mode
commit 49746f4b5f8a5167fe033f858711fa5643031097
Author: Tathagata Das <[email protected]>
Date: 2016-05-24T21:33:48Z
Add public API for output mode and upgraded memory sink to support complete
mode
commit 2786090bccd64945c273f9344c0493e4d93eec14
Author: Tathagata Das <[email protected]>
Date: 2016-05-24T22:11:32Z
Added unit test for MemorySink
commit 02b10ac4419f657e3756a95f352a29e20d01ad7d
Author: Tathagata Das <[email protected]>
Date: 2016-05-24T22:25:41Z
Added unit test to DataFrameReaderWriterSuite
commit 61af0573a112a54bab05c070de7a36b8c74703dc
Author: Tathagata Das <[email protected]>
Date: 2016-05-24T22:53:12Z
Added python API for output mode
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]