Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/3997#discussion_r22799128
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -449,6 +449,16 @@ class SparseVector(
override def toString: String =
"(%s,%s,%s)".format(size, indices.mkString("[", ",", "]"),
values.mkString("[", ",", "]"))
--- End diff --
Thanks @srowen for the comment. Glad to discuss it with someone.
_Vector: override def hashCode(): Int = util.Arrays.hashCode(this.toArray)_
I understand it's the general guideline to override `hashCode` at the same
time.
Yet intentionally or not, the original code promises that `DenseVector` and
`SparseVector` would return the same results of `equals` and `hashCode` for the
same array content. And that makes some senses.
As in the description of the PR, I donât want to introduce breaking
changes. And if we want to keep the original design, the current implementation
of `hashCode` in `Vector` is one of the best choices. Thatâs why `hashCode`
was intentionally left out of the PR. (maybe I should add some comment)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]