GitHub user yanboliang reopened a pull request:
https://github.com/apache/spark/pull/19020
[SPARK-3181] [ML] Implement huber loss for LinearRegression.
## What changes were proposed in this pull request?
MLlib ```LinearRegression``` supports _huber_ loss addition to
_leastSquares_ loss. The huber loss objective function is:

Refer Eq.(6) and Eq.(8) in [A robust hybrid of lasso and ridge
regression](http://statweb.stanford.edu/~owen/reports/hhu.pdf). This objective
is jointly convex as a function of (w, Ï) â R Ã (0,â), we can use
L-BFGS-B to solve it.
The current implementation is a straight forward porting for Python
scikit-learn
[```HuberRegressor```](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html).
There are some differences:
* We use mean loss (```lossSum/weightSum```), but sklearn uses total loss
(```lossSum```).
* We multiply the loss function and L2 regularization by 1/2. It does not
affect the result if we multiply the whole formula by a factor, we just keep
consistent with _leastSquares_ loss.
So if fitting w/o regularization, MLlib and sklearn produce the same
output. If fitting w/ regularization, MLlib should set ```regParam``` divide by
the number of instances to match the output of sklearn.
## How was this patch tested?
Unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanboliang/spark spark-3181
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19020.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19020
----
commit c208c7b3ac4917098dd07ddc777e65fae77e21d0
Author: Yanbo Liang <[email protected]>
Date: 2017-08-20T05:45:36Z
Implement HuberAggregator and add tests.
commit d72059debf079bb5aebfb9010e1a25241dc82856
Author: Yanbo Liang <[email protected]>
Date: 2017-08-21T13:43:28Z
Implement huber loss for LinearRegression.
commit c7456837166c70aedb2c164b7e5f17ebd7033c56
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T03:21:42Z
Update HuberAggregator and tests.
commit 43822bde3983214c54a571c86fbc534170b86415
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T04:34:02Z
Update params doc and check for illegal params.
commit 5b431461acd15284f635da5f7c220f186386e351
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T07:25:27Z
Update LinearRegression test suites.
commit 9545cdef56251755669c6fd4dbf301779d4115f3
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T08:19:29Z
Add mima excludes.
commit f8fb60ace2cfb1e2cf739f09dab01e60ba7cb4df
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T10:29:16Z
Fix docs.
commit 0951b02e15632200bb0e3052f95ec5126091f98e
Author: Yanbo Liang <[email protected]>
Date: 2017-08-22T11:25:39Z
Fix annotation.
commit 8bdd625c996dca93ca915446cc4b3ff39b7478e9
Author: Yanbo Liang <[email protected]>
Date: 2017-09-01T07:06:00Z
Update and reorg test cases.
commit ae2a9f89d7d88599e06709b15e3f368692b1e067
Author: Yanbo Liang <[email protected]>
Date: 2017-09-01T07:26:28Z
Minor update for tests.
commit d21b4fb1397c48e43dcb9dc13344788d8caa1036
Author: Yanbo Liang <[email protected]>
Date: 2017-09-01T08:07:46Z
Rename m to epsilon.
commit 3d3f1ec6696a1cd6f3cd2ee1bfb4e74790403c84
Author: Yanbo Liang <[email protected]>
Date: 2017-09-22T07:41:25Z
Address review comments.
commit 7359635e9a2a4f050418b5d3f51ee85fb73b4d2d
Author: Yanbo Liang <[email protected]>
Date: 2017-09-22T08:34:20Z
Add loss function formula for LinearRegression.
commit 8c6622f68ea81cedbeb3f03f957b335a99dedd46
Author: Yanbo Liang <[email protected]>
Date: 2017-10-03T05:16:26Z
Expose scale for LinearRegressionModel.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]