GitHub user jose-torres opened a pull request:
https://github.com/apache/spark/pull/20096
[SPARK-22908] Add kafka source and sink for continuous processing.
## What changes were proposed in this pull request?
Add kafka source and sink for continuous processing. This involves two
small changes to the execution engine:
* Bring data reader close() into the normal data reader thread to avoid
thread safety issues.
* Fix up the semantics of the RECONFIGURING StreamExecution state. State
updates are now atomic, and we don't have to deal with swallowing an exception.
## How was this patch tested?
new unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jose-torres/spark continuous-kafka
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20096.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20096
----
commit eec38d374a4f3db46a26b8926192ae44da90b6ae
Author: Jose Torres <jose@...>
Date: 2017-12-21T21:07:35Z
basic kafka
commit f91cb0190d5a5d7942cd3d53bc571140a03965c6
Author: Jose Torres <jose@...>
Date: 2017-12-24T21:40:20Z
move reader close to data reader thread in case reader isn't thread safe
commit 7c180db439c9d3c6389c5ff6033a61341e7f1bbf
Author: Jose Torres <jose@...>
Date: 2017-12-27T20:43:16Z
test + small fixes
commit 7596e34f1e8a047263da7bf8522a14869f289125
Author: Jose Torres <jose@...>
Date: 2017-12-27T21:21:56Z
fixes lost in cherrypick
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]