[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-17 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-660387140


   Thanks all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659171262


   @viirya  This PR is only on cpu.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659165588


   friendly ping @huaxingao @srowen @viirya 
   
   Different another attempt to save RAM, this should be a clear optimization. 
I found that those methods can not be marked `@tailrec`, so I use while-loop 
instead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-13 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-657966575


   test:
   ```
   import org.apache.spark.ml.linalg._
   import org.apache.spark.ml.classification._
   import org.apache.spark.storage.StorageLevel
   
   
   val df = spark.read.option("numFeatures", 
"2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("label",
 (col("label")+1)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   
   
   val rf = new RandomForestClassifier().setMaxDepth(10).setNumTrees(100)
   val model = rf.fit(df)
   model.save("/tmp/rf-model")
   
   
   val rf2 = new RandomForestClassifier().setMaxDepth(20).setNumTrees(100)
   val model2 = rf2.fit(df)
   model2.save("/tmp/rf-model-d20")
   
   
   
   val model = RandomForestClassificationModel.load("/tmp/rf-model")
   val model2 = RandomForestClassificationModel.load("/tmp/rf-model-d20")
   
   val vecs = df.select("features").rdd.map(row => row.getAs[Vector](0)).collect
   
   val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => 
vecs.foreach(model.predict)}; val end = System.currentTimeMillis; end - start
   
   
   val start = System.currentTimeMillis; Seq.range(0, 20).foreach{_ => 
vecs.foreach(model2.predict)}; val end = System.currentTimeMillis; end - start
   ```
   
   
   Results (durations):
   this PR: 167640, 404901
   Master: 187645, 416243



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org