[GitHub] [spark] zhengruifeng commented on issue #27261: [SPARK-30503][ML] OnlineLDAOptimizer does not handle persistance correctly

GitBox Fri, 17 Jan 2020 02:25:51 -0800

zhengruifeng commented on issue #27261: [SPARK-30503][ML] OnlineLDAOptimizer 
does not handle persistance correctly
URL: https://github.com/apache/spark/pull/27261#issuecomment-575567216
 
 
   The previous graph is handled like `Pregel`:
   ```scala
       var prevG: Graph[VD, ED] = null
       var i = 0
       while (activeMessages > 0 && i < maxIterations) {
         // Receive the messages and update the vertices.
         prevG = g
         g = g.joinVertices(messages)(vprog)
         graphCheckpointer.update(g)
   
         ...
   
         // Unpersist the RDDs hidden by newly-materialized RDDs
         oldMessages.unpersist()
         prevG.unpersistVertices()
         prevG.edges.unpersist()
         // count the iteration
         i += 1
       }
   ```
   
   It is somewhat stranger that `prevG` needs to be unpersisted both inside and 
outside of `graphCheckpointer`, which is quite different from 
`PeriodicRDDCheckpointer`.
   
   And the `persist` and `unpersist` method in `PeriodicGraphCheckpointer` are 
not consistent.
   ```scala
     override protected def persist(data: Graph[VD, ED]): Unit = {
       if (data.vertices.getStorageLevel == StorageLevel.NONE) {
         /* We need to use cache because persist does not honor the default 
storage level requested
          * when constructing the graph. Only cache does that.
          */
         data.vertices.cache()
       }
       if (data.edges.getStorageLevel == StorageLevel.NONE) {
         data.edges.cache()
       }
     }
   
     override protected def unpersist(data: Graph[VD, ED]): Unit = 
data.unpersist()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on issue #27261: [SPARK-30503][ML] OnlineLDAOptimizer does not handle persistance correctly

Reply via email to