GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/1518

    [SPARK-2505][MLlib] Weighted Regularizer for Generalized Linear Model 

    (Note: This is not ready to be merged. Need documentation, and make sure 
it's backforwad compatible with Spark 1.0 apis). 
    
    The current implementation of regularization in linear model is using 
`Updater`, and this design has couple issues as the following.
    1) It will penalize all the weights including intercept. In machine 
learning training process, typically, people don't penalize the intercept. 
    2) The `Updater` has the logic of adaptive step size for gradient decent, 
and we would like to clean it up by separating the logic of regularization out 
from updater to regularizer so in LBFGS optimizer, we don't need the trick for 
getting the loss and gradient of objective function.
    In this work, a weighted regularizer will be implemented, and users can 
exclude the intercept or any weight from regularization by setting that term 
with zero weighted penalty. Since the regularizer will return a tuple of loss 
and gradient, the adaptive step size logic, and soft thresholding for L1 in 
Updater will be moved to SGD optimizer.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AlpineNow/spark SPARK-2505_regularizer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1518
    
----
commit 2946930ec3de0e0a34e07d065c954d7aabacd4ba
Author: DB Tsai <[email protected]>
Date:   2014-07-19T02:15:37Z

    initial work

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to