Github user MechCoder commented on a diff in the pull request:
https://github.com/apache/spark/pull/6948#discussion_r34386073
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
@@ -184,6 +199,82 @@ class LocalLDAModel private[clustering] (
}
+@Experimental
+object LocalLDAModel extends Loader[LocalLDAModel]{
+
+ private object SaveLoadV1_0 {
+
+ val formatVersionV1_0 = "1.0"
+
+ val classNameV1_0 = "org.apache.spark.mllib.clustering.LocalLDAModel"
+
+ // Store the distribution of terms of each topic as a Row in data.
+ case class Data(termDistributions: Vector)
+
+ def save(sc: SparkContext, path: String, topicsMatrix: Matrix): Unit =
{
+
+ val sqlContext = new SQLContext(sc)
+ import sqlContext.implicits._
+
+ val k = topicsMatrix.numCols
+ val metadata = compact(render
+ (("class" -> classNameV1_0) ~ ("version" -> formatVersionV1_0) ~
+ ("k" -> k) ~ ("vocabSize" -> topicsMatrix.numRows)))
+ sc.parallelize(Seq(metadata),
1).saveAsTextFile(Loader.metadataPath(path))
+
+ val topicsDenseMatrix = topicsMatrix.toBreeze.toDenseMatrix
--- End diff --
Are you sure that this should be done? If yes, then the slicing below would
become expensive (and elsewhere). I suggest we could keep this like this for
now and when we make topicsMatrix spares we could rewrite this as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]