GitHub user jose-torres opened a pull request:

    https://github.com/apache/spark/pull/21200

    [SPARK-24039][SS] Do continuous processing writes with multiple compute() 
calls

    ## What changes were proposed in this pull request?
    
    Do continuous processing writes with multiple compute() calls.
    
    The current strategy is hacky; we just call next() on an iterator which has 
already returned hasNext = false, knowing that all the nodes we whitelist 
handle this properly. This will not work in the long term.
    
    Most of the changes here are just refactoring to accommodate the new model. 
The functional changes are:
    
    * The writer now calls prev.compute(split, context) once per epoch within 
the epoch loop.
    * ContinuousDataSourceRDD now spawns a ContinuousQueuedDataReader which is 
shared across multiple calls to compute() for the same partition.
    
    ## How was this patch tested?
    
    existing unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jose-torres/spark noAggr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21200
    
----
commit 743419298c0a4bf98b4f547c3a6b3c9c86fdfacf
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-03-16T21:00:40Z

    partial

commit 7c3c3e248c66506c66643e60c2b3e0f4f415e33c
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-03-26T17:44:19Z

    rm old path

commit 49cd89ebf68af1fc45d642a64a69090b32ee1b19
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-03-27T02:15:00Z

    format + docs

commit 23a436f911e7b99dfbb9c18794933cae8c1fe363
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-18T21:59:23Z

    use agg

commit c9a074fe16d97d23ad8d0e9c64b65ac3718174ec
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-03-27T02:16:28Z

    rename node

commit ec0e68df89cc183e661327c7894390be395b93ef
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-03-30T00:59:36Z

    remove inheritance altogether

commit 7a4f1e72a3a139fee7980c54f312f30d8f738c04
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T18:52:59Z

    rvrt stream writer

commit 3a4991aa3345d6c5b088586b388269878d7667d3
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T18:55:50Z

    partial no rdd

commit 59710f6961040381344a4a8b297e061d275c4a83
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T18:57:34Z

    include rdd

commit 6426185059b4c5ac526f2da5fc40a6b8433638ae
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T18:59:32Z

    without shared task

commit 0c061f3e41f751bf78af1501b1c2764460ee9d7d
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T19:47:09Z

    working without restart

commit 90049f962f60b5702afca31f723a1e0b2b06d094
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-19T19:47:27Z

    include new file

commit 7463ac32ffa030e68cd0e5bdcba16c9b90687822
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-20T18:04:41Z

    fix restarts

commit ccd2b380316b5bc6e073b448f49710d5bf2277ea
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T16:47:54Z

    remove aggregate changes

commit 38498e3c9473b7ec90ea10c5edada60ee2a69769
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T21:56:16Z

    cleanup naming and use map

commit ca545e490e11a69d8329f5f7605590339d419991
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:06:07Z

    add docs

commit aee0cda5a0554015a28d48c9e6db756d53b8aa5f
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:08:03Z

    remove unused class

commit d8f90b1b03cc9eed1ebcec992baaf0006e34ca94
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:11:56Z

    split out EpochPollRunnable

commit 4f9f16142afdf75edc2a8cbaebe36546305aa832
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:17:12Z

    split data reader thread and fix file name

commit 54c0bf1b65e22157166e2159cf886b912a07828e
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:30:44Z

    fix imports

commit 373826e129c522575df5d2c26c7ec56cca218c40
Author: Jose Torres <torres.joseph.f+github@...>
Date:   2018-04-30T22:35:21Z

    add ContinuousDataSourceRDD docs

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to