Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/15149#discussion_r79583415
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -176,10 +176,26 @@ private[ml] trait HasThreshold extends Params {
private[ml] trait HasThresholds extends Params {
/**
- * Param for Thresholds in multi-class classification to adjust the
probability of predicting each class. Array must have length equal to the
number of classes, with values >= 0. The class with largest value p/t is
predicted, where p is the original probability of that class and t is the
class' threshold.
+ * Defines "thresholds" for each class. These do not, actually, define
the minimum
+ * probability per class for that class to be chosen. They act like like
'cutoff' values in
+ * R's
[[https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
randomForest]]
+ * package, which ironically are also not cutoffs. They define a sort of
inverse weight, and if
+ * defined, the class that is chosen is the one whose ratio of class
probability to threshold
--- End diff --
This doc is good but we essentially say "the class with highest p/t is
chosen" twice - once here and again in the paragraph below. Perhaps we can
consolidate?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]