ca...

MLnick Wed, 10 Aug 2016 01:49:44 -0700

GitHub user MLnick opened a pull request:

    https://github.com/apache/spark/pull/14579


    [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() should return Python 
context managers

    JIRA: https://issues.apache.org/jira/browse/SPARK-16921
    
    Context managers are a natural way to capture closely related setup and 
teardown code in Python. It can be useful to apply this pattern to 
persisting/unpersisting RDDs and DataFrames.
    
    This PR makes RDDs and DataFrames implement the context manager `__enter__` 
and `__exit__`  functions, allowing code such as:
    
    ```python
    with labeled_data.persist():
        model = pipeline.fit(labeled_data)
    ```
    
    ## How was this patch tested?
    
    New doc tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MLnick/spark SPARK-16921-rdd-df-ctxmgr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14579
    
----
commit 2b4e56e72bf3cd291349baf6feb197666d368b67
Author: Nick Pentreath <[email protected]>
Date:   2016-08-10T08:39:18Z

    Make RDD and DataFrame a context manager

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

Reply via email to