Github user srowen commented on the issue:
https://github.com/apache/spark/pull/23016
I'm not sure about that, because the returned PrefixSpanModel has an RDD
that depends on that RDD. We could cache the final RDD instead and materialize
it; that could make more sense.
In other places we have done such a thing only when the input is cached, in
order to kind of follow the caller's lead, but there isn't a consistent
standard for this.
I'd be OK improving this to persist the final RDD only, and then unpersist
the intermediate one. That makes at least more sense. You can cache at the same
storage level as the input (which might be NONE)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]