Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14949
Oh, I get it now. That makes sense. If this were being applied to decision
trees only, that would make sense and we could fix this up and document the
meaning. I agree it only makes sense to return "no class" if actually
thresholding.
The only problem here is that this is not being applied just to a random
forest implementation but to all classifiers that output a probability. That's
a little more of a stretch. I suppose the result here can be thought of as a
likelihood ratio of class probability vs prior, not some hacky heuristic
specific to the CRAN package. I think the name is unfortunate because I would
not have guessed that's the meaning given the name (though to be fair the
scaladoc does say what it means).
I'll close this but what's the best way forward?
Option 1.
Keep current behavior. Modify https://github.com/apache/spark/pull/14643 to
include Nick's suggestions above, and add a bunch of documentation about what
'thresholds' really means here.
Option 2.
As above but deprecate threshold and rename to 'cutoff' to be a little
clearer.
Option 3.
As in Option 2 but also go back and actually implement thresholds.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]