Github user BigCrunsh commented on the pull request:
https://github.com/apache/spark/pull/2137#issuecomment-53569214
Currently, MLLIB contains linear models (GLMs) that produce scores based on
an inner product, classification models that might derive a classification
using scores, and probabilistic models that provide a confidence score (or a
probability under some model assumption) in addition to the predicted class.
Currently the score for classification models is only available by removing the
threshold:
```scala
val classes = model.predict(testset)
val scores = model.clearThreshold().predict(testset)
```
The threshold is lost after the last step and for LogReg it is not possible
to access the (uncalibrated) score. However, depending on the model, I would
expect that one has direct and consistent access to all of these values:
```scala
val classes = model.predictClass(testset)
val scores = model.predictScore(testset)
val probs = model.predictProbability(testset)
```
@mengxr: I think in general a probability is some measure of likeliness
that an event will occur. It is often based on some more or less realistic
model assumptions (e.g., normal assumption in regression, t-tests, etc.), isn't
it? The exponential family, which is the assumption of the class-wise
conditional distributions ``p(features|class)``, comprises commonly used
distributions as multinomial, Poisson, and Gaussian distribution. The learning
algorithm (with tuned hyper-parameter) is then "responsible" to calibrated
these probabilities. Do you have a more appropriate name to distinguish between
scores and "probabilities".
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]