Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/6112#discussion_r30863343
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -717,6 +719,23 @@ class SparseVector(
new SparseVector(size, ii, vv)
}
}
+
+ override def argmax: Int = {
+ if (size == 0) {
+ -1
+ } else {
+ var maxIdx = indices(0)
+ var maxValue = values(0)
+
+ foreachActive { (i, v) =>
--- End diff --
Sorry, I didn't explain this clearly. All inactive values in a sparse
vector are zeros. The edge case here is that zero could be the max value of the
entries. For example, `[-1.0, 0.0, -3.0].argmax == 1` but `0.0` doesn't appear
in `foreachActive` if the sparse vector is `SparseVector(3, Array(0, 2),
Array(-1.0, -3.0))`. If we only look at the active values, the argmax would be
0 as `-1.0` is the max among active values. So we need to cover the following
cases if all active values are negative:
1. if the number of active entries are the same as vector size (i.e., no
inactive entries), use the current max and its index,
2. if there are inactive entries, find one and output its index.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]