Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14643
  
    ... for example,
    
    ```
      /**
       * Given a vector of class conditional probabilities, select the 
predicted label.
       * This returns the class, if any, whose probability is equal to or 
greater than its
       * threshold (if specified), and whose probability is highest. If several 
classes meet
       * their thresholds and are equally probable, the one with lower 
threshold is selected.
       * If several have equal thresholds, the one with lower class index is 
selected.
       * @return  predicted label
       */
      protected def probability2prediction(probability: Vector): Double = {
        if (isDefined(thresholds)) {
          probability.toArray.zip(getThresholds).zipWithIndex.
            filter { case ((p, t), _) => p >= t }.
            maxBy { case ((p, t), i) => (p, -t, -i) }._2
        } else {
          probability.toArray.zipWithIndex.maxBy { case (p, i) => (p, -i) }._2
        }
      }
    ```
    
    @jkbradley what do you think? this is entirely deterministic now, 
interprets thresholds in the more usual way, but still uses them to break ties.
    
    The only catch here is that this tries to explicitly handle the case where 
nothing exceeds its threshold now. It will now return NaN rather than actually 
return a class whose probability doesn't meet its threshold.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to