[GitHub] spark pull request: ROC area under the curve for binary classifica...

srowen Sun, 16 Mar 2014 21:13:56 -0700

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/160#issuecomment-37783918
  
    If I may wade in with a comment. I am not clear a 
`BinaryClassificationModel` is needed. What it adds, the score method, seems to 
just return 0/1 depending on the predicted class. The result is already 0/1, or 
a probability, in which case this adds little. The ROC calculation feels like 
it does not require a superclass like this just for its own sake.
    
    (Separately: I think there is a design problem here with 
`ClassificationModel` outputting `Double`. Classifiers output a class, which is 
an enumerated, opaque value, or a distribution of probabilities over classes, 
which is a mapping of opaque values to numbers. It so happens that you can map 
opaque values to 0,1,2. And it so happens these are not only ints but real 
values. And it so happens that you can map the distribution over 2 classes into 
a single number in [0,1]. And it so happens that these can all be represented 
by a `Double`, just like the output of a `RegressionModel`. But it's severe 
overloading that will cause tears later. But this is a separate issue. A design 
change along these lines *might* necessitate a special subclass for binary 
classifiers, but not sure.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ROC area under the curve for binary classifica...

Reply via email to