GitHub user dtolpin opened a pull request:

    https://github.com/apache/spark/pull/9888

    [SPARK-11904] [PySpark] reduceByKeyAndWindow does not require checkpointing 
when invFunc is None

    when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, 
slidesize)` is equivalent to
    
         reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, 
slidesize)
    
    and no checkpoint is necessary. The corresponding Scala code does exactly 
that, but Python code always creates a windowed stream with obligatory 
checkpointing. The patch fixes this. 
    
    I do not know how to unit-test this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dtolpin/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9888
    
----
commit 6730f72d2d9aa2c535abc9719e589369cc7b4cdb
Author: David Tolpin <[email protected]>
Date:   2015-11-21T23:22:31Z

    invFunc=None does not require checkpointing
    
    reduceByKeyAndWindow(func, None, window_size, slide_size) is equivalent to 
reduceByKey(func).window(window_size, slide_size).reduceByKey(func) and should 
not require checkpointing.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to