Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/23016#discussion_r234395721
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
---
@@ -174,6 +174,10 @@ class PrefixSpan private (
val freqSequences = results.map { case (seq: Array[Int], count: Long)
=>
new FreqSequence(toPublicRepr(seq), count)
}
+ // Cache the final RDD to the same storage level as input
+ freqSequences.persist(data.getStorageLevel)
--- End diff --
The problem here is that it won't get persisted until something
materializes it, and at that point its dependent RDD dataInternalRepr is
already unpersisted.
I'd say that _if_ the input's storage level isn't NONE, then persist
freqSequences at the same level and .count() it to materialize it. Then
unpersist dataInternalRepr in all events.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]