viirya commented on issue #25576: [SPARK-28866][ML] Persist item factors RDD when checkpointing in ALS URL: https://github.com/apache/spark/pull/25576#issuecomment-526378257 In the implicit case, we don't do .count() after .checkpoint(), because in later computeFactors, we materialize the checkpointed RDD. That is why there is a comment saying `itemFactors gets materialized in computeFactors: ``` if (shouldCheckpoint(iter)) { itemFactors.checkpoint() // itemFactors gets materialized in computeFactors } ``` In non-implicit case, computeFactors doesn't materialize it, so .count() is needed. In the non-implicit case, we don't need to persist user factors. Because in this case, the factors RDDs are only referred once in each iteration, and no materialization is happened (except for checkpoint + .count() on item factors).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
