zhengruifeng commented on pull request #30034:
URL: https://github.com/apache/spark/pull/30034#issuecomment-712663674
test code:
```
import org.apache.spark.ml.linalg._
import org.apache.spark.ml.regression._
import org.apache.spark.storage.StorageLevel
val df = spark.read.option("numFeatures",
"2000").format("libsvm").load("/data1/Datasets/epsilon/epsilon_normalized.t").withColumn("censor",
(col("label")+1)/2).withColumn("label", (col("label")+2)/2)
df.persist(StorageLevel.MEMORY_AND_DISK)
df.count
val aft = new AFTSurvivalRegression().setMaxIter(2)
val aftm = aft.fit(df)
val vectors = df.select("features").rdd.map(_.getAs[Vector](0)).collect
val start = System.currentTimeMillis;
vectors.foreach(aftm.predictQuantiles); val end = System.currentTimeMillis; end
- start
val start = System.currentTimeMillis; (0 until 100).foreach(_ =>
vectors.foreach(aftm.predictQuantiles)); val end = System.currentTimeMillis;
end - start
```
result:
1, existing impl:
start: Long = 1603178348115
end: Long = 1603178389375
res3: Long = 41260
2, commit:
https://github.com/apache/spark/pull/30034/commits/720af450275823349ceea3ab903b91e1734fa05c
`Vectors.dense(_quantiles(0).map(_ * lambda))`:
start: Long = 1603178027072
end: Long = 1603178055599
res3: Long = 28527
3, commit:
https://github.com/apache/spark/pull/30034/commits/b0d3da781eb85706e0f29ab4c6ad47e2b6ad468f
`while (i < quantiles.length) { quantiles(i) *= lambda; i += 1 }`
start: Long = 1603179708452
end: Long = 1603179734777
res5: Long = 26325
I found that using SIMD will get another stable 6%~8% performace improvement.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]