[GitHub] [spark] zhengruifeng commented on pull request #30034: [SPARK-33111][ML][Follow-Up] aft transform optimization - predictQuantiles

GitBox Tue, 20 Oct 2020 00:49:00 -0700


zhengruifeng commented on pull request #30034:
URL: https://github.com/apache/spark/pull/30034#issuecomment-712663674



   test code:
   ```
   import org.apache.spark.ml.linalg._
   import org.apache.spark.ml.regression._
   import org.apache.spark.storage.StorageLevel
   
   
   val df = spark.read.option("numFeatures", 
"2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("censor",
 (col("label")+1)/2).withColumn("label", (col("label")+2)/2)
   df.persist(StorageLevel.MEMORY_AND_DISK)
   df.count
   
   
   val aft = new AFTSurvivalRegression().setMaxIter(2)
   val aftm = aft.fit(df)
   
   
   val vectors = df.select("features").rdd.map(_.getAs[Vector](0)).collect
   
   val start = System.currentTimeMillis; 
vectors.foreach(aftm.predictQuantiles); val end = System.currentTimeMillis; end 
- start
   
   
   val start = System.currentTimeMillis; (0 until 100).foreach(_ => 
vectors.foreach(aftm.predictQuantiles)); val end = System.currentTimeMillis; 
end - start
   ```
   
   result:
   1, existing impl:
   start: Long = 1603178348115
   end: Long = 1603178389375
   res3: Long = 41260
   
   2, commit: 
https://github.com/apache/spark/pull/30034/commits/720af450275823349ceea3ab903b91e1734fa05c
     `Vectors.dense(_quantiles(0).map(_ * lambda))`:
   start: Long = 1603178027072
   end: Long = 1603178055599
   res3: Long = 28527
   
   3, commit: 
https://github.com/apache/spark/pull/30034/commits/b0d3da781eb85706e0f29ab4c6ad47e2b6ad468f
     `while (i < quantiles.length) { quantiles(i) *= lambda; i += 1 }`
   start: Long = 1603179708452
   end: Long = 1603179734777
   res5: Long = 26325
   
   I found that using SIMD will get another stable 6%~8% performace improvement.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #30034: [SPARK-33111][ML][Follow-Up] aft transform optimization - predictQuantiles

Reply via email to