Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23048#discussion_r235031986
  
    --- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -370,14 +370,19 @@ object Vectors {
           case (v1: DenseVector, v2: SparseVector) =>
             squaredDistance = sqdist(v2, v1)
     
    -      case (DenseVector(vv1), DenseVector(vv2)) =>
    -        var kv = 0
    +      case (DenseVector(vv1), DenseVector(vv2)) => {
             val sz = vv1.length
    -        while (kv < sz) {
    -          val score = vv1(kv) - vv2(kv)
    -          squaredDistance += score * score
    -          kv += 1
    +        @annotation.tailrec
    --- End diff --
    
    The StackOverflow post points to a reason the two implementations that were 
being compared there could be different. Here I don't see any branching is 
saved. 
    
    I can't explain your microbenchmark, which still surprises me. If I had to 
guess it's because of JIT compilation or something. I do wonder whether the 
microbenchmark shows an improvement that won't actually materialize in this 
code. That is, I think you'd have to show that this improves the performance of 
Spark DenseVectors when used as-is in the Vector class here.
    
    I'm pretty skeptical but stranger things have happened.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to