zhengruifeng commented on issue #27261: [SPARK-30503][ML] OnlineLDAOptimizer does not handle persistance correctly URL: https://github.com/apache/spark/pull/27261#issuecomment-575565245 testCode: ```scala import org.apache.spark.ml.clustering.LDA val dataset = spark.read.format("libsvm").load("data/mllib/sample_lda_libsvm_data.txt") val lda = new LDA().setK(10).setMaxIter(100).setOptimizer("em") sc.getPersistentRDDs val start = System.currentTimeMillis; val model = lda.fit(dataset); val end = System.currentTimeMillis; end - start sc.getPersistentRDDs sc.getPersistentRDDs.size sc.getPersistentRDDs.foreach(println) ``` this PR: ```scala start: Long = 1579250257523 model: org.apache.spark.ml.clustering.LDAModel = DistributedLDAModel: uid=lda_2a48ae87b788, k=10, numFeatures=11 end: Long = 1579250268529 res1: Long = 11006 scala> sc.getPersistentRDDs.foreach(println) (2441,EdgeRDD MapPartitionsRDD[2441] at mapPartitions at EdgeRDDImpl.scala:119) (2438,VertexRDD, VertexRDD ZippedPartitionsRDD2[2438] at zipPartitions at VertexRDD.scala:322) (29,VertexRDD, VertexRDD ZippedPartitionsRDD2[29] at zipPartitions at VertexRDD.scala:322) (32,EdgeRDD MapPartitionsRDD[32] at mapPartitions at EdgeRDDImpl.scala:119) ``` master: ```scala scala> val start = System.currentTimeMillis; val model = lda.fit(dataset); val end = System.currentTimeMillis; end - start start: Long = 1579255989886 model: org.apache.spark.ml.clustering.LDAModel = DistributedLDAModel: uid=lda_f600c29d8e0a, k=10, numFeatures=11 end: Long = 1579256001181 res1: Long = 11295 scala> sc.getPersistentRDDs.size res2: Int = 106 ``` There seems no perfermance regression.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
