[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...

BigCrunsh Wed, 27 Aug 2014 06:08:20 -0700

Github user BigCrunsh commented on the pull request:

    https://github.com/apache/spark/pull/2137#issuecomment-53569214
  
    Currently, MLLIB contains linear models (GLMs) that produce scores based on 
an inner product, classification models that might derive a classification 
using scores, and probabilistic models that provide a confidence score (or a 
probability under some model assumption) in addition to the predicted class. 
Currently the score for classification models is only available by removing the 
threshold:
    ```scala
    val classes = model.predict(testset)
    val scores = model.clearThreshold().predict(testset)
    ```
    The threshold is lost after the last step and for LogReg it is not possible 
to access the (uncalibrated) score. However, depending on the model, I would 
expect that one has direct and consistent access to all of these values:
    ```scala
    val classes = model.predictClass(testset)
    val scores = model.predictScore(testset)
    val probs = model.predictProbability(testset)
    ```
    @mengxr: I think in general a probability is some measure of likeliness 
that an event will occur. It is often based on some more or less realistic 
model assumptions (e.g., normal assumption in regression, t-tests, etc.), isn't 
it? The exponential family, which is the assumption of the class-wise 
conditional distributions ``p(features|class)``, comprises commonly used 
distributions as multinomial, Poisson, and Gaussian distribution. The learning 
algorithm (with tuned hyper-parameter) is then "responsible" to calibrated 
these probabilities. Do you have a more appropriate name to distinguish between 
scores and "probabilities".



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...

Reply via email to