[GitHub] [spark] zhengruifeng commented on a change in pull request #26982: [SPARK-30329][ML] add iterator/foreach methods for Vectors

GitBox Tue, 24 Dec 2019 18:45:19 -0800

zhengruifeng commented on a change in pull request #26982: [SPARK-30329][ML] 
add iterator/foreach methods for Vectors
URL: https://github.com/apache/spark/pull/26982#discussion_r361244663


 ##########
 File path: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala
 ##########
 @@ -753,6 +782,39 @@ class SparseVector @Since("2.0.0") (
     }.unzip
     new SparseVector(selectedIndices.length, sliceInds.toArray, 
sliceVals.toArray)
   }
+
+  @Since("3.0.0")
+  override def iterator: Iterator[(Int, Double)] = {
 
 Review comment:
   OK, I will make them private.
   Other methods will make existing impl more concise in many places, for 
example:
   
   in NaiveBayes:
   ```scala
    private[ml] def requireZeroOneBernoulliValues(v: Vector): Unit = {
       val values = v match {
         case sv: SparseVector => sv.values
         case dv: DenseVector => dv.values
       }
   
       require(values.forall(v => v == 0.0 || v == 1.0),
         s"Bernoulli naive Bayes requires 0 or 1 feature values but found $v.")
     }
   ```
   
   will be changed to
   
   ```scala
   private[ml] def requireZeroOneBernoulliValues(v: Vector): Unit = {
       require(v.nonZeroIterator.forall(v == 1.0),
         s"Bernoulli naive Bayes requires 0 or 1 feature values but found $v.")
   }
   ```
   
   
   in `VectorAssembler`:
   ```scala
   vec.foreachActive { case (i, v) =>
             if (v != 0.0) {
               indices += featureIndex + i
               values += v
             }
           }
   ```
   will be changed to
   ```scala
   vec.foreachNonZero { case (i, v) =>
              indices += featureIndex + i
              values += v
          }
   ```
   
   So I tend to also include other methods.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on a change in pull request #26982: [SPARK-30329][ML] add iterator/foreach methods for Vectors

Reply via email to