[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

tdas Tue, 24 May 2016 16:03:06 -0700

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/13286


    [SPARK-15517][SQL][STREAMING] Add support for complete output mode in 
Structure Streaming

    ## What changes were proposed in this pull request?
    Currently structured streaming only supports append output mode.  This PR 
adds the following.
    
    - Added support for Complete output mode in the internal state store, 
analyzer and planner.
    - Added public API in Scala and Python for users to specify output mode
    - Added checks for unsupported combinations of output mode and DF operations
      - Plans with no aggregation should support only Append mode
      - Plans with aggregation should support only Update and Complete modes
      - Default output mode is Append mode (should we change this to 
automatically set to complete mode when there is aggregation?)
    - Added support for Complete output mode in Memory Sink. So Memory Sink 
supports append and complete, not update.
    
    ## How was this patch tested?
    Unit tests in various test suites
    - StreamingAggregationSuite: tests for complete mode
    - MemorySinkSuite: tests for checking behavior in Append and Complete 
modes. 
    - UnsupportedOperationSuite: tests for checking unsupported combinations of 
DF ops and output modes
    - DataFrameReaderWriterSuite: tests for checking that output mode cannot be 
called on static DFs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark complete-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13286
    
----
commit 469d69aefea17abbb889a8983a59d83988aaff45
Author: Tathagata Das <[email protected]>
Date:   2016-05-23T23:40:13Z

    First commit to support complete mode

commit 49746f4b5f8a5167fe033f858711fa5643031097
Author: Tathagata Das <[email protected]>
Date:   2016-05-24T21:33:48Z

    Add public API for output mode and upgraded memory sink to support complete 
mode

commit 2786090bccd64945c273f9344c0493e4d93eec14
Author: Tathagata Das <[email protected]>
Date:   2016-05-24T22:11:32Z

    Added unit test for MemorySink

commit 02b10ac4419f657e3756a95f352a29e20d01ad7d
Author: Tathagata Das <[email protected]>
Date:   2016-05-24T22:25:41Z

    Added unit test to DataFrameReaderWriterSuite

commit 61af0573a112a54bab05c070de7a36b8c74703dc
Author: Tathagata Das <[email protected]>
Date:   2016-05-24T22:53:12Z

    Added python API for output mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Reply via email to