Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15149#discussion_r79583415
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
    @@ -176,10 +176,26 @@ private[ml] trait HasThreshold extends Params {
     private[ml] trait HasThresholds extends Params {
     
       /**
    -   * Param for Thresholds in multi-class classification to adjust the 
probability of predicting each class. Array must have length equal to the 
number of classes, with values >= 0. The class with largest value p/t is 
predicted, where p is the original probability of that class and t is the 
class' threshold.
    +   * Defines "thresholds" for each class. These do not, actually, define 
the minimum
    +   * probability per class for that class to be chosen. They act like like 
'cutoff' values in
    +   * R's 
[[https://cran.r-project.org/web/packages/randomForest/randomForest.pdf 
randomForest]]
    +   * package, which ironically are also not cutoffs. They define a sort of 
inverse weight, and if
    +   * defined, the class that is chosen is the one whose ratio of class 
probability to threshold
    --- End diff --
    
    This doc is good but we essentially say "the class with highest p/t is 
chosen" twice - once here and again in the paragraph below. Perhaps we can 
consolidate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to