GitHub user MLnick opened a pull request:
https://github.com/apache/spark/pull/14579
[SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() should return Python
context managers
JIRA: https://issues.apache.org/jira/browse/SPARK-16921
Context managers are a natural way to capture closely related setup and
teardown code in Python. It can be useful to apply this pattern to
persisting/unpersisting RDDs and DataFrames.
This PR makes RDDs and DataFrames implement the context manager `__enter__`
and `__exit__` functions, allowing code such as:
```python
with labeled_data.persist():
model = pipeline.fit(labeled_data)
```
## How was this patch tested?
New doc tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MLnick/spark SPARK-16921-rdd-df-ctxmgr
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14579.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14579
----
commit 2b4e56e72bf3cd291349baf6feb197666d368b67
Author: Nick Pentreath <[email protected]>
Date: 2016-08-10T08:39:18Z
Make RDD and DataFrame a context manager
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]