srowen commented on a change in pull request #24963: [SPARK-28159][ML] Make the 
transform natively in ml framework to avoid extra conversion
URL: https://github.com/apache/spark/pull/24963#discussion_r299526026
 
 

 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
 ##########
 @@ -642,6 +639,20 @@ private[clustering] object OnlineLDAOptimizer {
     }
 
     val sstatsd = expElogthetad.asDenseMatrix.t * (ctsVector /:/ 
phiNorm).asDenseMatrix
-    (gammad, sstatsd, ids)
+    (gammad, sstatsd, indices)
+  }
+
+  private[clustering] def variationalTopicInference(
+      termCounts: Vector,
+      expElogbeta: BDM[Double],
+      alpha: breeze.linalg.Vector[Double],
+      gammaShape: Double,
+      k: Int,
+      seed: Long): (BDV[Double], BDM[Double], List[Int]) = {
+    val (ids: List[Int], cts: Array[Double]) = termCounts match {
+      case v: DenseVector => ((0 until v.size).toList, v.values)
 
 Review comment:
   Here and elsewhere, as an optimization, can we avoid `(0 until 
v.size).toList)`? pass an empty list in this case or something, and then deduce 
that the indices are just the same length as the values?
   
   You're generally solving this with separate sparse/dense methods which could 
be fine too if it doesn't result in too much code duplication and improves 
performance in the dense case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to