zhengruifeng commented on issue #27261: [SPARK-30503][ML] OnlineLDAOptimizer does not handle persistance correctly URL: https://github.com/apache/spark/pull/27261#issuecomment-575567216 The previous graph is handled like `Pregel`: ```scala var prevG: Graph[VD, ED] = null var i = 0 while (activeMessages > 0 && i < maxIterations) { // Receive the messages and update the vertices. prevG = g g = g.joinVertices(messages)(vprog) graphCheckpointer.update(g) ... // Unpersist the RDDs hidden by newly-materialized RDDs oldMessages.unpersist() prevG.unpersistVertices() prevG.edges.unpersist() // count the iteration i += 1 } ``` It is somewhat stranger that `prevG` needs to be unpersisted both inside and outside of `graphCheckpointer`, which is quite different from `PeriodicRDDCheckpointer`. And the `persist` and `unpersist` method in `PeriodicGraphCheckpointer` are not consistent. ```scala override protected def persist(data: Graph[VD, ED]): Unit = { if (data.vertices.getStorageLevel == StorageLevel.NONE) { /* We need to use cache because persist does not honor the default storage level requested * when constructing the graph. Only cache does that. */ data.vertices.cache() } if (data.edges.getStorageLevel == StorageLevel.NONE) { data.edges.cache() } } override protected def unpersist(data: Graph[VD, ED]): Unit = data.unpersist() ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
