zhengruifeng commented on issue #26982: [SPARK-30329][ML] add iterator/foreach methods for Vectors URL: https://github.com/apache/spark/pull/26982#issuecomment-568413966 Performace test for `SparseVector.iterator`: ```scala import org.apache.spark.ml.linalg._ import scala.util.Random val rand = new Random(123) val size = 100000000 val nnz = 1000000 val indices = Array.fill(nnz * 2)(rand.nextInt(size).abs).distinct.take(nnz).sorted val values = Array.fill(nnz)(rand.nextDouble) val sv = Vectors.sparse(size, indices, values) scala> val start = System.currentTimeMillis; Seq.range(0, 10).foreach{_ => Iterator.range(0, size).map{sv.apply}.sum }; val end = System.currentTimeMillis; end - start start: Long = 1577092459234 end: Long = 1577092537941 res5: Long = 78707 scala> val start = System.currentTimeMillis; Seq.range(0, 10).foreach{_ => sv.iterator.map(_._2).sum }; val end = System.currentTimeMillis; end - start start: Long = 1577092546174 end: Long = 1577092579521 res6: Long = 33347 ``` In this case, impl based on internal cursors are about 2.4X faster than directly using `apply` method.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
