[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-660387140 Thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-659171262 @viirya This PR is only on cpu. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-659165588 friendly ping @huaxingao @srowen @viirya Different another attempt to save RAM, this should be a clear optimization. I found that those methods can not be marked `@tailrec`, so I use while-loop instead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization
zhengruifeng commented on pull request #29095: URL: https://github.com/apache/spark/pull/29095#issuecomment-657966575 test: ``` import org.apache.spark.ml.linalg._ import org.apache.spark.ml.classification._ import org.apache.spark.storage.StorageLevel val df = spark.read.option("numFeatures", "2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("label", (col("label")+1)/2) df.persist(StorageLevel.MEMORY_AND_DISK) df.count val rf = new RandomForestClassifier().setMaxDepth(10).setNumTrees(100) val model = rf.fit(df) model.save("/tmp/rf-model") val rf2 = new RandomForestClassifier().setMaxDepth(20).setNumTrees(100) val model2 = rf2.fit(df) model2.save("/tmp/rf-model-d20") val model = RandomForestClassificationModel.load("/tmp/rf-model") val model2 = RandomForestClassificationModel.load("/tmp/rf-model-d20") val vecs = df.select("features").rdd.map(row => row.getAs[Vector](0)).collect val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model.predict)}; val end = System.currentTimeMillis; end - start val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => vecs.foreach(model2.predict)}; val end = System.currentTimeMillis; end - start ``` Results (durations): this PR: 167640, 404901 Master: 187645, 416243 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org