Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3997#discussion_r23027089
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
    @@ -449,6 +449,31 @@ class SparseVector(
       override def toString: String =
         "(%s,%s,%s)".format(size, indices.mkString("[", ",", "]"), 
values.mkString("[", ",", "]"))
     
    +  override def equals(other: Any): Boolean = {
    +    other match {
    +      case v: SparseVector => {
    +        if (this.size != v.size) { return false }
    +        var k1 = 0
    +        var k2 = 0
    +        while (true) {
    +          while (k1 < this.values.size && this.values(k1) == 0) k1 += 1
    +          while (k2 < v.values.size && v.values(k2) == 0) k2 += 1
    +
    +          if (k1 == this.values.size || k2 == v.values.size) {
    +            return (k1 == this.values.size && k2 == v.values.size) // 
check end alignment
    +          }
    +          if (this.indices(k1) != v.indices(k2) || this.values(k1) != 
v.values(k2)) {
    +            return false
    +          }
    +          k1 += 1
    +          k2 += 1
    +        }
    +        throw new Exception("unreachable")
    --- End diff --
    
    I wondered to myself whether this could be simplified to not have `while 
(true)`, the dummy `Exception`, etc. The best I could do was with a helper 
function:
    
    ```
    ...
    var k1 = nextNonzero(this.values, 0)
    var k2 = nextNonzero(v.values, 0)
    
    while (k1 < this.values.size && k2 < v.values.size) {
      if (this.indices(k1) != v.indices(k2) || this.values(k1) != v.values(k2)) 
{
        return false
      }
      k1 = nextNonzero(this.values, k1 + 1)
      k2 = nextNonzero(v.values, k2 + 1)
    }
    
    return (k1 == this.values.size && k2 == v.values.size) 
    ...
    
    def nextNonzero(values: Array[Double], from: Int): Int = {
      var index = from
      while (index < this.values.size && this.values(index) == 0.0) index += 1
      index
    }
    ```
    
    I'm not sure it's better, just food for thought.
    
    So the idea would be to specialize `hashCode` as well, and also handle 
`DenseVector` right? and even remove the implementations in the parent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to