GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/10082
[SPARK-11713] [PYSPARK] [STREAMING] Initial RDD updateStateByKey for PySpark
Adding ability to define an initial state RDD for use with updateStateByKey
PySpark. Added unit test and changed stateful_network_wordcount example to use
initial RDD.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
initial-rdd-updateStateByKey-SPARK-11713
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10082.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10082
----
commit a124810b13bf072bd0b3177b79f8fe04847ac22a
Author: Bryan Cutler <[email protected]>
Date: 2015-11-21T00:10:03Z
[SPARK-11713] Added initialRDD parameter to PySpark updateStateByKey
commit f7316f17dc3d1e357b2f73e96613f74b79cf3dcc
Author: Bryan Cutler <[email protected]>
Date: 2015-11-21T00:10:32Z
[SPARK-11713] Added unit test for PySpark updateStateByKey with an initial
state
commit f2e484b7af102a0056edc29213bfaffdaa782432
Author: Bryan Cutler <[email protected]>
Date: 2015-11-21T00:33:14Z
[SPARK-11713] Updated example stateful_network_wordcount.py to use initial
RDD state
commit a14e55b581129f0e090cdf50bf63b41da1800f20
Author: Bryan Cutler <[email protected]>
Date: 2015-12-01T23:34:12Z
Merge remote-tracking branch 'upstream/master' into
initial-rdd-updateStateByKey-SPARK-11713
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]