GitHub user yanboliang reopened a pull request:

    https://github.com/apache/spark/pull/19020

    [SPARK-3181] [ML] Implement huber loss for LinearRegression.

    ## What changes were proposed in this pull request?
    MLlib ```LinearRegression``` supports _huber_ loss addition to 
_leastSquares_ loss. The huber loss objective function is:
    
![image](https://user-images.githubusercontent.com/1962026/29554124-9544d198-8750-11e7-8afa-33579ec419d5.png)
    Refer Eq.(6) and Eq.(8) in [A robust hybrid of lasso and ridge 
regression](http://statweb.stanford.edu/~owen/reports/hhu.pdf). This objective 
is jointly convex as a function of (w, σ) ∈ R × (0,∞), we can use 
L-BFGS-B to solve it.
    
    The current implementation is a straight forward porting for Python 
scikit-learn 
[```HuberRegressor```](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html).
 There are some differences:
    * We use mean loss (```lossSum/weightSum```), but sklearn uses total loss 
(```lossSum```).
    * We multiply the loss function and L2 regularization by 1/2. It does not 
affect the result if we multiply the whole formula by a factor, we just keep 
consistent with _leastSquares_ loss.
    
    So if fitting w/o regularization, MLlib and sklearn produce the same 
output. If fitting w/ regularization, MLlib should set ```regParam``` divide by 
the number of instances to match the output of sklearn.
    
    ## How was this patch tested?
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-3181

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19020
    
----
commit c208c7b3ac4917098dd07ddc777e65fae77e21d0
Author: Yanbo Liang <[email protected]>
Date:   2017-08-20T05:45:36Z

    Implement HuberAggregator and add tests.

commit d72059debf079bb5aebfb9010e1a25241dc82856
Author: Yanbo Liang <[email protected]>
Date:   2017-08-21T13:43:28Z

    Implement huber loss for LinearRegression.

commit c7456837166c70aedb2c164b7e5f17ebd7033c56
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T03:21:42Z

    Update HuberAggregator and tests.

commit 43822bde3983214c54a571c86fbc534170b86415
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T04:34:02Z

    Update params doc and check for illegal params.

commit 5b431461acd15284f635da5f7c220f186386e351
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T07:25:27Z

    Update LinearRegression test suites.

commit 9545cdef56251755669c6fd4dbf301779d4115f3
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T08:19:29Z

    Add mima excludes.

commit f8fb60ace2cfb1e2cf739f09dab01e60ba7cb4df
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T10:29:16Z

    Fix docs.

commit 0951b02e15632200bb0e3052f95ec5126091f98e
Author: Yanbo Liang <[email protected]>
Date:   2017-08-22T11:25:39Z

    Fix annotation.

commit 8bdd625c996dca93ca915446cc4b3ff39b7478e9
Author: Yanbo Liang <[email protected]>
Date:   2017-09-01T07:06:00Z

    Update and reorg test cases.

commit ae2a9f89d7d88599e06709b15e3f368692b1e067
Author: Yanbo Liang <[email protected]>
Date:   2017-09-01T07:26:28Z

    Minor update for tests.

commit d21b4fb1397c48e43dcb9dc13344788d8caa1036
Author: Yanbo Liang <[email protected]>
Date:   2017-09-01T08:07:46Z

    Rename m to epsilon.

commit 3d3f1ec6696a1cd6f3cd2ee1bfb4e74790403c84
Author: Yanbo Liang <[email protected]>
Date:   2017-09-22T07:41:25Z

    Address review comments.

commit 7359635e9a2a4f050418b5d3f51ee85fb73b4d2d
Author: Yanbo Liang <[email protected]>
Date:   2017-09-22T08:34:20Z

    Add loss function formula for LinearRegression.

commit 8c6622f68ea81cedbeb3f03f957b335a99dedd46
Author: Yanbo Liang <[email protected]>
Date:   2017-10-03T05:16:26Z

    Expose scale for LinearRegressionModel.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to