Github user beckgael commented on a diff in the pull request:
https://github.com/apache/spark/pull/23048#discussion_r234937901
--- Diff:
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -370,14 +370,19 @@ object Vectors {
case (v1: DenseVector, v2: SparseVector) =>
squaredDistance = sqdist(v2, v1)
- case (DenseVector(vv1), DenseVector(vv2)) =>
- var kv = 0
+ case (DenseVector(vv1), DenseVector(vv2)) => {
val sz = vv1.length
- while (kv < sz) {
- val score = vv1(kv) - vv2(kv)
- squaredDistance += score * score
- kv += 1
+ @annotation.tailrec
--- End diff --
Hi, i put it [here](https://github.com/beckgael/functional_sqdist_spark) my
experimental process with a SparkNotebook or with sbt project in order to allow
others to tests.
I took the `sqdist` function as it is, make a functional version of it and
compare both as best as i can.
Run the notebook on 100k pair (Imperativ/Functional) of `sqdist` for medium
size vectors took less than 1min and results are different enough to consider
them i presume, even if it's not the best way to test it and i'm sorry for that.
After having browsed on StackOverFlow i fall on [this
one](https://stackoverflow.com/questions/9168624/why-is-my-scala-tail-recursion-faster-than-the-while-loop)
which talks about a difference between JVM version on tailrec.
I hope we'll find something interesting and that i'm not disturbing you on
your fantastic work !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]