zhengruifeng commented on a change in pull request #26982: [SPARK-30329][ML]
add iterator/foreach methods for Vectors
URL: https://github.com/apache/spark/pull/26982#discussion_r361244663
##########
File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala
##########
@@ -753,6 +782,39 @@ class SparseVector @Since("2.0.0") (
}.unzip
new SparseVector(selectedIndices.length, sliceInds.toArray,
sliceVals.toArray)
}
+
+ @Since("3.0.0")
+ override def iterator: Iterator[(Int, Double)] = {
Review comment:
OK, I will make them private.
Other methods will make existing impl more concise in many places, for
example:
in NaiveBayes:
```scala
private[ml] def requireZeroOneBernoulliValues(v: Vector): Unit = {
val values = v match {
case sv: SparseVector => sv.values
case dv: DenseVector => dv.values
}
require(values.forall(v => v == 0.0 || v == 1.0),
s"Bernoulli naive Bayes requires 0 or 1 feature values but found $v.")
}
```
will be changed to
```scala
private[ml] def requireZeroOneBernoulliValues(v: Vector): Unit = {
require(v.nonZeroIterator.forall(v == 1.0),
s"Bernoulli naive Bayes requires 0 or 1 feature values but found $v.")
}
```
in `VectorAssembler`:
```scala
vec.foreachActive { case (i, v) =>
if (v != 0.0) {
indices += featureIndex + i
values += v
}
}
```
will be changed to
```scala
vec.foreachNonZero { case (i, v) =>
indices += featureIndex + i
values += v
}
```
So I tend to also include other methods.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]