Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/23048#discussion_r235031986
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -370,14 +370,19 @@ object Vectors {
case (v1: DenseVector, v2: SparseVector) =>
squaredDistance = sqdist(v2, v1)
- case (DenseVector(vv1), DenseVector(vv2)) =>
- var kv = 0
+ case (DenseVector(vv1), DenseVector(vv2)) => {
val sz = vv1.length
- while (kv < sz) {
- val score = vv1(kv) - vv2(kv)
- squaredDistance += score * score
- kv += 1
+ @annotation.tailrec
--- End diff --
The StackOverflow post points to a reason the two implementations that were
being compared there could be different. Here I don't see any branching is
saved.
I can't explain your microbenchmark, which still surprises me. If I had to
guess it's because of JIT compilation or something. I do wonder whether the
microbenchmark shows an improvement that won't actually materialize in this
code. That is, I think you'd have to show that this improves the performance of
Spark DenseVectors when used as-is in the Vector class here.
I'm pretty skeptical but stranger things have happened.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]