Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/14579
cc @nchammas @holdenk @rxin
**Note:**
This is implemented by adding the `__enter__` and `__exit__` methods to
RDD/DataFrame directly. This allows some potentially weird stuff, such as any
instance of RDD/DF, including any method returning `self`, can be used in a
`with` statement, e.g. this works:
```python
with rdd.map(lambda x: x) as x:
...
```
Clearly this doesn't make a lot of sense. However, I looked at the 2
options of (a) a separate context manager wrapper class returned by `persist`;
and (b) trying to dynamically add the methods (or at least `__enter__`) in
`persist`.
The problem with (a) is that `persist` needs to return an RDD/DF instance,
so this breaks chaining behavior such as `rdd.cache().count()` etc.
The problem with (b) is that the special method `__enter__` is called in
the context of `with` as `type(rdd).__enter__(rdd)` (see [PEP
343](https://www.python.org/dev/peps/pep-0343/)). So it does not help to add a
method dynamically to an instance, it must be done to the class. In this case,
then after the first `with` statement usage, *all* existing and future
instances of RDD/DF have the `__enter__` method, putting us in the same
situation as the approach in this PR of having the methods defined on the class
(with associated allowed "weirdness").
So, if we want to avoid that, the only option I see is a variant of (a)
above - adding a `cached`/`persisted` method that returns a context manager, so
it would look like this:
```python
with cached(rdd) as x:
x.count()
```
This is less "elegant" but more explicit.
Any other smart ideas for handling option (b) above, please do shout!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]