GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/11006

    [SPARK-10820][SQL] Support for the continuous execution of structured 
queries

    This is a follow up to 9aadcffabd226557174f3ff566927f873c71672e that 
extends Spark SQL to allow users to _repeatedly_ optimize and execute 
structured queries.  A `ContinuousQuery` can be expressed using SQL, DataFrames 
or Datasets.  The purpose of this PR is only to add some initial infrastructure 
which will be extended in subsequent PRs.
    
    ## User-facing API
    
    - `sqlContext.streamFrom` and `df.streamTo` return builder objects that are 
analogous to the `read/write` interfaces already available to executing queries 
in a batch-oriented fashion.
    - `ContinuousQuery` provides an interface for interacting with a query that 
is currently executing in the background.
    
    ## Internal Interfaces
     - `StreamExecution` - executes streaming queries in micro-batches
    
    The following are currently internal, but public APIs will be provided in a 
future release.
     - `Source` - an interface for providers of continually arriving data.  A 
source must have a notion of an `Offset` that monotonically tracks what data 
has arrived.  For fault tolerance, a source must be able to replay data given a 
start offset.
     - `Sink` - an interface that accepts the results of a continuously 
executing query.  Also responsible for tracking the offset that should be 
resumed from in the case of a failure.
    
    ## Testing
     - `MemoryStream` and `MemorySink` - simple implementations of source and 
sink that keep all data in memory and have methods for simulating durability 
failures
     - `StreamTest` - a framework for performing actions and checking 
invariants on a continuous query

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark structured-streaming

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11006
    
----
commit e238911f494c2db41e36c9cde9dd0b733630b36f
Author: Michael Armbrust <[email protected]>
Date:   2015-12-10T02:51:06Z

    first draft

commit d2706b511f254feff7d4558550e403215ffb795f
Author: Michael Armbrust <[email protected]>
Date:   2015-12-11T07:49:33Z

    working on state

commit 7a3590fe5dd659d1a47e645eec247e21280a3373
Author: Michael Armbrust <[email protected]>
Date:   2015-12-11T19:26:40Z

    working on stateful streaming

commit c8a923831bbfc24f71eb744e36a15e432d6ae067
Author: Michael Armbrust <[email protected]>
Date:   2015-12-12T00:34:19Z

    now with event time windows

commit dddd192b7fb9efc59f22009adf61a432b5bafc19
Author: Michael Armbrust <[email protected]>
Date:   2015-12-13T03:16:48Z

    some refactoring after talking to ali

commit a0a1e7bbd85d0de3f6d90793b894a8b00f86a7f0
Author: Michael Armbrust <[email protected]>
Date:   2015-12-15T00:47:55Z

    docs

commit 15bed31800319547eb616c0cc22c564bf967b7f1
Author: Michael Armbrust <[email protected]>
Date:   2015-12-15T18:04:42Z

    start kinesis

commit 89464a91e3954f30b68a1f633e2b9c062e08fa2c
Author: Michael Armbrust <[email protected]>
Date:   2015-12-17T02:18:35Z

    some renaming

commit d133dbbed3d3ff7146129cd6532b1f26f2201315
Author: Michael Armbrust <[email protected]>
Date:   2015-12-28T00:01:57Z

    WIP: file source

commit e3c4c8301fdcfaaa0bd56ed92a81e4e1d2db64a8
Author: Michael Armbrust <[email protected]>
Date:   2016-01-05T05:39:40Z

    Merge remote-tracking branch 'origin/master' into streaming-infra
    
    Conflicts:
        sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

commit 90fa6d30f0bf70a7c0c4bfe0889a07cbdddbf203
Author: Michael Armbrust <[email protected]>
Date:   2016-01-05T05:52:43Z

    remove half-baked stateful implementation

commit b1c1dc6ead2c76589e48f4c28ef02e6c4361e549
Author: Michael Armbrust <[email protected]>
Date:   2016-01-05T06:36:46Z

    cleanup

commit 92050688699b4c509f5ec5c2b636797cb5ae3c89
Author: Michael Armbrust <[email protected]>
Date:   2016-01-05T23:32:34Z

    more docs

commit 7a59f00b4f0d238c33d3250fcfe162d7c2c27c18
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T00:00:48Z

    Add circle.yml file.

commit eab186d5780b2946a32041ca17d80323b052efc6
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T00:48:42Z

    rollback changes

commit 20750630bd50bf7d0abad4a23b2a717705115e46
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T00:56:16Z

    Try using cached resolution to speed up compilation

commit dabc102271e09bae2ddca1366deb13b20c10ded3
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T00:59:10Z

    Use assembly/assembly.

commit 4630a87751bad6660d12645a5a7e56ea39dad48b
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T01:10:02Z

    Add test:compile so that test dependencies are also cached.

commit cd575db5bb4ab1e954f80a3b0d044ab492394443
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T01:10:23Z

    some feedback

commit 03f7c5daeb1cc9ede78651a7722923dbfadc868c
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T01:34:14Z

    Disable cached resolution.

commit 0760d16cfde55382d412ad002f019ceb848a6188
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T02:20:48Z

    Run assembly and test in separate commands

commit 90125826d744a2fdff4743cd33e5d9ce531aab1d
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T03:44:13Z

    Merge pull request #22 from marmbrus/circle-ci
    
    Add circle.yml file for configuring CircleCI

commit d233eaed9b9ec98a8155039a46d6e8b6b8a4d6ea
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T04:41:13Z

    comments

commit 3423dce332989c7997b1137d2a708fc6d75ec244
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T05:13:06Z

    Merge remote-tracking branch 'marmbrus/streaming-df' into streaming-infra

commit b0b20e50d4f712763a7f6edea8af7a39edeffe33
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T06:16:16Z

    Update circle.yml

commit f5d9642859a2862abfff924e700fbacf52807004
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T06:17:53Z

    Update SparkBuild.scala

commit f8911e4d6a6d32044b55212ae187eb4251bdfe02
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T06:44:31Z

    Update circle.yml

commit 6387781c63597680e341244d6347e9760c56c808
Author: Josh Rosen <[email protected]>
Date:   2016-01-06T07:03:01Z

    Add newlines to satisfy Scalastyle

commit 6242bc2eaecae4645622604d6e0de9802b472715
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T20:01:45Z

    revert CI changes

commit 95fd97807968c2437f104dc3e4578abad14f9554
Author: Michael Armbrust <[email protected]>
Date:   2016-01-06T21:38:53Z

    update based on TD's comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to