[GitHub] spark pull request #23048: transform DenseVector x DenseVector sqdist from i...

beckgael Tue, 20 Nov 2018 02:04:47 -0800

Github user beckgael commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23048#discussion_r234937901
  
    --- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
    @@ -370,14 +370,19 @@ object Vectors {
           case (v1: DenseVector, v2: SparseVector) =>
             squaredDistance = sqdist(v2, v1)
     
    -      case (DenseVector(vv1), DenseVector(vv2)) =>
    -        var kv = 0
    +      case (DenseVector(vv1), DenseVector(vv2)) => {
             val sz = vv1.length
    -        while (kv < sz) {
    -          val score = vv1(kv) - vv2(kv)
    -          squaredDistance += score * score
    -          kv += 1
    +        @annotation.tailrec
    --- End diff --
    
    Hi, i put it [here](https://github.com/beckgael/functional_sqdist_spark) my 
experimental process with a SparkNotebook or with sbt project in order to allow 
others to tests.
    I took the `sqdist` function as it is, make a functional version of it and 
compare both as best as i can.
    Run the notebook on 100k pair (Imperativ/Functional) of `sqdist` for medium 
size vectors took less than 1min and results are different enough to consider 
them i presume, even if it's not the best way to test it and i'm sorry for that.
    After having browsed on StackOverFlow i fall on [this 
one](https://stackoverflow.com/questions/9168624/why-is-my-scala-tail-recursion-faster-than-the-while-loop)
 which talks about a difference between JVM version on tailrec.
    
    I hope we'll find something interesting and that i'm not disturbing you on 
your fantastic work !




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23048: transform DenseVector x DenseVector sqdist from i...

Reply via email to