Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/20956#discussion_r180064831 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala --- @@ -166,9 +166,13 @@ private[spark] class NodeIdCache( } } } + if (nodeIdsForInstances != null) { + // Unpersist current one if one exists. + nodeIdsForInstances.unpersist(false) + } if (prevNodeIdsForInstances != null) { // Unpersist the previous one if one exists. - prevNodeIdsForInstances.unpersist() + prevNodeIdsForInstances.unpersist(false) --- End diff -- For now `deleteAllCheckpoints` is only called once in whole MLLIB, and current `unpsersit` of `prevNodeIdsForInstances` is in it. So I think we do not need to impl another method to unpersist datasets (like `PeriodicCheckpointer`)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org