[GitHub] spark issue #14949: [SPARK-17057] [ML] ProbabilisticClassifierModels' predic...

srowen Mon, 12 Sep 2016 05:56:55 -0700

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14949
  
    Oh, I get it now. That makes sense. If this were being applied to decision 
trees only, that would make sense and we could fix this up and document the 
meaning. I agree it only makes sense to return "no class" if actually 
thresholding.
    
    The only problem here is that this is not being applied just to a random 
forest implementation but to all classifiers that output a probability. That's 
a little more of a stretch. I suppose the result here can be thought of as a 
likelihood ratio of class probability vs prior, not some hacky heuristic 
specific to the CRAN package. I think the name is unfortunate because I would 
not have guessed that's the meaning given the name (though to be fair the 
scaladoc does say what it means).
    
    I'll close this but what's the best way forward?
    
    Option 1.
    Keep current behavior. Modify https://github.com/apache/spark/pull/14643 to 
include Nick's suggestions above, and add a bunch of documentation about what 
'thresholds' really means here.
    
    Option 2.
    As above but deprecate threshold and rename to 'cutoff' to be a little 
clearer.
    
    Option 3.
    As in Option 2 but also go back and actually implement thresholds.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14949: [SPARK-17057] [ML] ProbabilisticClassifierModels' predic...

Reply via email to