[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

srowen Wed, 14 Nov 2018 15:58:26 -0800

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/23016
  
    I'm not sure about that, because the returned PrefixSpanModel has an RDD 
that depends on that RDD. We could cache the final RDD instead and materialize 
it; that could make more sense. 
    
    In other places we have done such a thing only when the input is cached, in 
order to kind of follow the caller's lead, but there isn't a consistent 
standard for this.
    
    I'd be OK improving this to persist the final RDD only, and then unpersist 
the intermediate one. That makes at least more sense. You can cache at the same 
storage level as the input (which might be NONE)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

Reply via email to