Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6112#discussion_r30863343
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
    @@ -717,6 +719,23 @@ class SparseVector(
           new SparseVector(size, ii, vv)
         }
       }
    +
    +  override def argmax: Int = {
    +    if (size == 0) {
    +      -1
    +    } else {
    +      var maxIdx = indices(0)
    +      var maxValue = values(0)
    +
    +      foreachActive { (i, v) =>
    --- End diff --
    
    Sorry, I didn't explain this clearly. All inactive values in a sparse 
vector are zeros. The edge case here is that zero could be the max value of the 
entries. For example, `[-1.0, 0.0, -3.0].argmax == 1` but `0.0` doesn't appear 
in `foreachActive` if the sparse vector is `SparseVector(3, Array(0, 2), 
Array(-1.0, -3.0))`. If we only look at the active values, the argmax would be 
0 as `-1.0` is the max among active values. So we need to cover the following 
cases if all active values are negative:
    
    1. if the number of active entries are the same as vector size (i.e., no 
inactive entries), use the current max and its index,
    2. if there are inactive entries, find one and output its index.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to