Github user schmit commented on the pull request:
https://github.com/apache/spark/pull/160#issuecomment-38003806
On your more general remarks @srowen:
I think those are valid concerns, here is my reasoning for doing it this
way:
The predict function returns the label, but I need the predicted "score" or
predicted probability (in case of LR) of the test samples in order to sort them.
Also, in more generality, this seems like a useful function to have. I do
not want to change the predict function, since that is what is probably most
used and wanted, and it would be annoying to change the score into a label by
hand, and only in the binary classification setting.
However, this score function only makes sense in the binary classification
setting, and so does ROC AUC. Later I hope to add the PR AUC as well, and that
can be added to the same class, but first things first.
The alternative is to define this function for both LR and SVM separately,
but I don't like that either.
However, I do agree it is not the most clean code, so your suggestions are
very welcome.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---