[ https://issues.apache.org/jira/browse/SPARK-30329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-30329. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26982 [https://github.com/apache/spark/pull/26982] > add iterator/foreach methods for Vectors > ---------------------------------------- > > Key: SPARK-30329 > URL: https://issues.apache.org/jira/browse/SPARK-30329 > Project: Spark > Issue Type: Wish > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Assignee: zhengruifeng > Priority: Major > Fix For: 3.0.0 > > > 1, foreach: there are a lot of places that we need to traversal all the > elements, current we impl like this: > {code:java} > var i = 0; while (i < vec.size) { val v = vec(i); ...; i += 1 } {code} > This method is for both convenience and performace: > For a SparseVector, the total complexity is O(size * log(nnz)), since an > apply call has log(nnz) complexity due to usage of binary search; > However, this can be optimized by operations of cursors. > > 2, foreachNonZero: the usage of foreachActive is mostly binded with filter > value!=0, like > {code:java} > vec.foreachActive { case (i, v) => > if (v != 0.0) { > ... > } > } > {code} > Here we can add this method for convenience. > > 3, iterator/activeIterator/nonZeroIterator: add those three iterators, so > that we can futuremore add/change some impls based on those iterators for > both ml and mllib sides, to avoid vector conversions. > For example, I want to optimize PCA by using ml.stat.Summarizer instead of > Statistics.colStats/mllib.MultivariateStatisticalSummary, to avoid > computation of unused variables. > After having these iterators, I can do it without vector conversions. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org