Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3997#discussion_r22799128
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
    @@ -449,6 +449,16 @@ class SparseVector(
       override def toString: String =
         "(%s,%s,%s)".format(size, indices.mkString("[", ",", "]"), 
values.mkString("[", ",", "]"))
     
    --- End diff --
    
    Thanks @srowen for the comment. Glad to discuss it with someone.
    
    _Vector: override def hashCode(): Int = util.Arrays.hashCode(this.toArray)_
     
    I understand it's the general guideline to override `hashCode` at the same 
time.
    Yet intentionally or not, the original code promises that `DenseVector` and 
`SparseVector` would return the same results of `equals` and `hashCode` for the 
same array content. And that makes some senses.
    
    As in the description of the PR, I don’t want to introduce breaking 
changes. And if we want to keep the original design, the current implementation 
of `hashCode` in `Vector` is one of the best choices. That’s why `hashCode` 
was intentionally left out of the PR. (maybe I should add some comment)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to