GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/6109

    [SPARK-7568][ML] ml.LogisticRegression doesn't output the right prediction

    This is because we regularize the intercept before which effectively 
regularizing 
    less on the weights. Now, we follow the R's standard without regularizing 
the intercept, 
    so we need to decrease the regularization. 
    
    BTW, we need to implement efficient cross validation soon to help users to 
pick up lambda.
    
    with lambda = 0.001 in current LOR implementation, the prediction is
    ```
    (4, spark i j k) --> prob=[0.1596407738787411,0.8403592261212589], 
prediction=1.0
    (5, l m n) --> prob=[0.8378325685476612,0.16216743145233883], prediction=0.0
    (6, mapreduce spark) --> prob=[0.6693126798261013,0.3306873201738986], 
prediction=0.0
    (7, apache hadoop) --> prob=[0.9821575333444208,0.01784246665557917], 
prediction=0.0
    ```
    and the training accuracy is
    ```
    (0, a b c d e spark) --> prob=[0.0021342419881406746,0.9978657580118594], 
prediction=1.0
    (1, b d) --> prob=[0.9959176174854043,0.004082382514595685], prediction=0.0
    (2, spark f g h) --> prob=[0.0014541569986711233,0.9985458430013289], 
prediction=1.0
    (3, hadoop mapreduce) --> prob=[0.9982978367343561,0.0017021632656438518], 
prediction=0.0
    ```
    
    Before the LOR change (the implementation that regularizes the intercept), 
the prediction is
    ```
    (4, spark i j k) --> prob=[0.18764577263047177,0.8123542273695282], 
prediction=1.0
    (5, l m n) --> prob=[0.6508848199790638,0.3491151800209362], prediction=0.0
    (6, mapreduce spark) --> prob=[0.561585214970727,0.43841478502927295], 
prediction=0.0
    (7, apache hadoop) --> prob=[0.9118076920593474,0.08819230794065264], 
prediction=0.0
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark lor-example

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6109
    
----
commit 8f40ccd38ccc42a06a86f56e2cfc6893daf8dc10
Author: DB Tsai <[email protected]>
Date:   2015-05-13T04:38:55Z

    first commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to