Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/160#issuecomment-37783918
If I may wade in with a comment. I am not clear a
`BinaryClassificationModel` is needed. What it adds, the score method, seems to
just return 0/1 depending on the predicted class. The result is already 0/1, or
a probability, in which case this adds little. The ROC calculation feels like
it does not require a superclass like this just for its own sake.
(Separately: I think there is a design problem here with
`ClassificationModel` outputting `Double`. Classifiers output a class, which is
an enumerated, opaque value, or a distribution of probabilities over classes,
which is a mapping of opaque values to numbers. It so happens that you can map
opaque values to 0,1,2. And it so happens these are not only ints but real
values. And it so happens that you can map the distribution over 2 classes into
a single number in [0,1]. And it so happens that these can all be represented
by a `Double`, just like the output of a `RegressionModel`. But it's severe
overloading that will cause tears later. But this is a separate issue. A design
change along these lines *might* necessitate a special subclass for binary
classifiers, but not sure.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---