zhengruifeng commented on pull request #31313:
URL: https://github.com/apache/spark/pull/31313#issuecomment-766530187
test code:
```
import org.apache.spark.ml.linalg._
import org.apache.spark.ml.feature._
import org.apache.spark.storage.StorageLevel
val df =
spark.read.format("libsvm").load("/d1/Datasets/epsilon/epsilon_normalized.t")
df.persist(StorageLevel.MEMORY_AND_DISK)
df.count
val brp = new
BucketedRandomProjectionLSH().setNumHashTables(100).setInputCol("features").setOutputCol("projected").setBucketLength(1.0).setSeed(12345)
val brpModel = brp.fit(df)
Seq.range(0, 1000).foreach { i => brpModel.transform(df).count } // warm up
val start = System.currentTimeMillis; Seq.range(0, 1000).foreach { i =>
brpModel.transform(df).count }; val end = System.currentTimeMillis; val
duration = end - start;
```
master:
start: Long = 1611547911926
end: Long = 1611547940855
duration: Long = 28929
this PR:
start: Long = 1611548225365
end: Long = 1611548249266
duration: Long = 23901
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]