Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/15149#discussion_r79583415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -176,10 +176,26 @@ private[ml] trait HasThreshold extends Params { private[ml] trait HasThresholds extends Params { /** - * Param for Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold. + * Defines "thresholds" for each class. These do not, actually, define the minimum + * probability per class for that class to be chosen. They act like like 'cutoff' values in + * R's [[https://cran.r-project.org/web/packages/randomForest/randomForest.pdf randomForest]] + * package, which ironically are also not cutoffs. They define a sort of inverse weight, and if + * defined, the class that is chosen is the one whose ratio of class probability to threshold --- End diff -- This doc is good but we essentially say "the class with highest p/t is chosen" twice - once here and again in the paragraph below. Perhaps we can consolidate?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org