GitHub user dtolpin opened a pull request:
https://github.com/apache/spark/pull/9887
[SPARK-11904] [PySpark] reduceByKeyAndWindow does not require checkpointing
when invFunc is None
when invFunc is None, `reduceByKeyAndWindow(func, None, winsize,
slidesize)` is equivalent to
reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize,
slidesize)
and no checkpoint is necessary. The corresponding Scala code does exactly
that, but Python code always creates a windowed stream with obligatory
checkpointing. The patch fixes this.
I do not know how to unit-test this.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dtolpin/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9887.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9887
----
commit 3f777e10abc68fc0c389c8bf55ad56c7c33ea095
Author: David Tolpin <[email protected]>
Date: 2015-11-17T20:37:21Z
invFunc=none work properly with python's reduceByKeyAndWindow
commit e76be0123aca50f055595fb6acb64b7defa981cf
Author: David Tolpin <[email protected]>
Date: 2015-11-19T11:44:12Z
added unit test for reduceByKeyAndWindow with invFunc=None
commit ba8baa9814fd104b107341037a2c1f6b33df0c16
Author: David Tolpin <[email protected]>
Date: 2015-11-21T22:49:21Z
reduceByKeyAndWindow with invFunc=None does require checkpointing
commit d54b880cc96462d418b2155db733881750d31365
Author: David Tolpin <[email protected]>
Date: 2015-11-21T22:50:50Z
merged with upstream
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]