GitHub user jose-torres opened a pull request:
https://github.com/apache/spark/pull/21200
[SPARK-24039][SS] Do continuous processing writes with multiple compute()
calls
## What changes were proposed in this pull request?
Do continuous processing writes with multiple compute() calls.
The current strategy is hacky; we just call next() on an iterator which has
already returned hasNext = false, knowing that all the nodes we whitelist
handle this properly. This will not work in the long term.
Most of the changes here are just refactoring to accommodate the new model.
The functional changes are:
* The writer now calls prev.compute(split, context) once per epoch within
the epoch loop.
* ContinuousDataSourceRDD now spawns a ContinuousQueuedDataReader which is
shared across multiple calls to compute() for the same partition.
## How was this patch tested?
existing unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jose-torres/spark noAggr
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21200.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21200
----
commit 743419298c0a4bf98b4f547c3a6b3c9c86fdfacf
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-03-16T21:00:40Z
partial
commit 7c3c3e248c66506c66643e60c2b3e0f4f415e33c
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-03-26T17:44:19Z
rm old path
commit 49cd89ebf68af1fc45d642a64a69090b32ee1b19
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-03-27T02:15:00Z
format + docs
commit 23a436f911e7b99dfbb9c18794933cae8c1fe363
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-18T21:59:23Z
use agg
commit c9a074fe16d97d23ad8d0e9c64b65ac3718174ec
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-03-27T02:16:28Z
rename node
commit ec0e68df89cc183e661327c7894390be395b93ef
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-03-30T00:59:36Z
remove inheritance altogether
commit 7a4f1e72a3a139fee7980c54f312f30d8f738c04
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T18:52:59Z
rvrt stream writer
commit 3a4991aa3345d6c5b088586b388269878d7667d3
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T18:55:50Z
partial no rdd
commit 59710f6961040381344a4a8b297e061d275c4a83
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T18:57:34Z
include rdd
commit 6426185059b4c5ac526f2da5fc40a6b8433638ae
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T18:59:32Z
without shared task
commit 0c061f3e41f751bf78af1501b1c2764460ee9d7d
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T19:47:09Z
working without restart
commit 90049f962f60b5702afca31f723a1e0b2b06d094
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-19T19:47:27Z
include new file
commit 7463ac32ffa030e68cd0e5bdcba16c9b90687822
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-20T18:04:41Z
fix restarts
commit ccd2b380316b5bc6e073b448f49710d5bf2277ea
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T16:47:54Z
remove aggregate changes
commit 38498e3c9473b7ec90ea10c5edada60ee2a69769
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T21:56:16Z
cleanup naming and use map
commit ca545e490e11a69d8329f5f7605590339d419991
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:06:07Z
add docs
commit aee0cda5a0554015a28d48c9e6db756d53b8aa5f
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:08:03Z
remove unused class
commit d8f90b1b03cc9eed1ebcec992baaf0006e34ca94
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:11:56Z
split out EpochPollRunnable
commit 4f9f16142afdf75edc2a8cbaebe36546305aa832
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:17:12Z
split data reader thread and fix file name
commit 54c0bf1b65e22157166e2159cf886b912a07828e
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:30:44Z
fix imports
commit 373826e129c522575df5d2c26c7ec56cca218c40
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-04-30T22:35:21Z
add ContinuousDataSourceRDD docs
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]