Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/21183#discussion_r184753197
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -473,7 +475,8 @@ final class OnlineLDAOptimizer extends LDAOptimizer
with Logging {
None
}
- val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] =
batch.mapPartitions { docs =>
+ val stats: RDD[(BDM[Double], Option[BDV[Double]], Long)] =
batch.mapPartitionsWithIndexInternal
--- End diff --
Let's not use mapPartitionsWithIndexInternal; I don't think closure
cleaning is expensive enough for us to worry about here. Use
mapPartitionsWithIndex instead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]