[GitHub] spark pull request: [SPARK-5537][MLib] Expand user guide for multi...

mengxr Thu, 26 Feb 2015 21:30:07 -0800

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4801#discussion_r25488493
  
    --- Diff: docs/mllib-linear-methods.md ---
    @@ -370,6 +336,59 @@ print("Training Error = " + str(trainErr))
     </div>
     </div>
     
    +### Logistic regression
    +
    +[Logistic regression](http://en.wikipedia.org/wiki/Logistic_regression) is 
widely used to predict a
    +binary response. It is a linear method as described above in equation 
`$\eqref{eq:regPrimal}$`,
    +with the loss function in the formulation given by the logistic loss:
    +`\[
    +L(\wv;\x,y) :=  \log(1+\exp( -y \wv^T \x)).
    +\]`
    +
    +Binary logistic regression can be generalized into multinomial logistic 
regression to
    +train and predict multi-class classification problems. For example, for 
$K$ possible outcomes,
    +one of the outcomes can be chosen as a "pivot", and the other $K - 1$ 
outcomes can be separately
    +regressed against the pivot outcome. In mllib, the first class, $0$ is 
chosen as "pivot" class.
    +See $Eq.~(4.17)$ and $Eq.~(4.18)$ on page 119 of
    +[The Elements of Statistical Learning: Data Mining, Inference, and 
Prediction, 2nd Edition]
    
+(http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf) 
by
    +Trevor Hastie, Robert Tibshirani, and Jerome Friedman, and
    +[Multinomial logistic 
regression](http://en.wikipedia.org/wiki/Multinomial_logistic_regression)
    +for references. Here is [the detailed mathematical derivation]
    +(http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297).
    +
    +For binary classification problems, the algorithm outputs a binary 
logistic regression model.
    +Given a new data point, denoted by $\x$, the model makes predictions by
    +applying the logistic function
    +`\[
    +\mathrm{f}(z) = \frac{1}{1 + e^{-z}}
    +\]`
    +where $z = \wv^T \x$.
    +By default, if $\mathrm{f}(\wv^T x) > 0.5$, the outcome is positive, or
    +negative otherwise, though unlike linear SVMs, the raw output of the 
logistic regression
    +model, $\mathrm{f}(z)$, has a probabilistic interpretation (i.e., the 
probability
    +that $\x$ is positive).
    +
    +For multi-class classification problems, the algorithm will outputs $K - 
1$ binary
    +logistic regression models regressed against the first class, $0$ as 
"pivot" outcome.
    +Given a new data points, $K - 1$ models will be run, and the probabilities 
will be
    +normalized into $1.0$. The class with largest probability will be chosen 
as output.
    +
    +#### Examples
    --- End diff --
    
    The examples are empty. I think we need to re-organize this file a little 
bit. Let's move SVM and LR out of binary classification and merge the 
evaluation part into each examples. If you are busy, I can take it from here:)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5537][MLib] Expand user guide for multi...

Reply via email to