GitHub user holdenk opened a pull request:
https://github.com/apache/spark/pull/10788
[SPARK-7780][MLLIB] intercept in logisticregressionwith lbfgs should not be
regularized
The intercept in Logistic Regression represents a prior on categories which
should not be regularized. In MLlib, the regularization is handled through
Updater, and the Updater penalizes all the components without excluding the
intercept which resulting poor training accuracy with regularization.
The new implementation in ML framework handles this properly, and we should
call the implementation in ML from MLlib since majority of users are still
using MLlib api.
Note that both of them are doing feature scalings to improve the
convergence, and the only difference is ML version doesn't regularize the
intercept. As a result, when lambda is zero, they will converge to the same
solution.
Previously partially reviewed at
https://github.com/apache/spark/pull/6386#issuecomment-168781424 re-opening for
@dbtsai to review.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/holdenk/spark
SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10788.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10788
----
commit a529c013fa722748cbd1d3878e4ea3bed5b15181
Author: Holden Karau <[email protected]>
Date: 2015-05-22T20:54:59Z
document plans
commit f9e26350d15d7d36b75ece4f4718797dbe2a0830
Author: Holden Karau <[email protected]>
Date: 2015-05-22T22:53:29Z
Some progress.
commit 7ebbd566e20923efc32dee1cfcf12ea315259e30
Author: Holden Karau <[email protected]>
Date: 2015-05-22T23:16:18Z
Keep track of the number of requested classes so that if its more than 2 we
use the legacy implementation. Also allow pass through of initialWeights
commit ef2a9b0f5b6cb2e971c2e5371f3394b4dec64574
Author: Holden Karau <[email protected]>
Date: 2015-05-22T23:48:06Z
Expose a train on instances method within Spark, use numOfLinearPredictors
instead of keeping track of class variable, pass through persistence information
commit 407491e38b1a5834d26a137ab20829a3d96f5142
Author: Holden Karau <[email protected]>
Date: 2015-05-24T01:14:04Z
tests are fun
commit e02bf3a9688d1efa2f3da60b3d9f27911b04955d
Author: Holden Karau <[email protected]>
Date: 2015-05-24T07:42:13Z
Start updating the tests to run with different updaters.
commit 8517539d0e8829833968dcb7e47ad8ba20849cb1
Author: Holden Karau <[email protected]>
Date: 2015-05-24T08:00:36Z
get the tests compiling
commit a619d42b821575afd8efa90f2a38edf9690eb0df
Author: Holden Karau <[email protected]>
Date: 2015-05-24T08:04:53Z
style fixed
commit 4febcc32f524edadeb68dc674e2681a087ffaa38
Author: Holden Karau <[email protected]>
Date: 2015-05-24T08:13:23Z
make the test method private
commit e8e03a13ba04c6b3100e290a5c435959c2f01912
Author: Holden Karau <[email protected]>
Date: 2015-05-24T20:16:13Z
CR feedback, pass RDD of Labeled points to ml implemetnation. Also from
tests require that feature scaling is turned on to use ml implementation.
commit 38a024bd9a36e83ef8005a5f2af8a4dd44f6760e
Author: Holden Karau <[email protected]>
Date: 2015-05-25T07:24:21Z
Convert it to a df and use set for the inital params
commit 478b8c5d5ff20478dc4ba913b0c77172e0abdfff
Author: Holden Karau <[email protected]>
Date: 2015-05-25T20:06:57Z
Handle non-dense weights
commit 08589f58b81bc1e6099b425f86226053c5b6a360
Author: Holden Karau <[email protected]>
Date: 2015-05-26T03:39:54Z
CR feedback: make the setInitialWeights function private, don't mess with
the weights when they are user supploed, validate that the user supplied
weights are reasonable.
commit f40c401496ae1e6cc7b39db820fea194d42c25c5
Author: Holden Karau <[email protected]>
Date: 2015-05-26T04:19:46Z
style fix up
commit f35a16aa8110a33c32959db674908d145be6e97f
Author: Holden Karau <[email protected]>
Date: 2015-06-02T23:29:11Z
Copy the number of iterations, convergence tolerance, and if we are fitting
an intercept from mllib to ml when training lbfgs model using ml code
commit 4d431a358074f5245abcbc95af3e2bdf75b4f21d
Author: Holden Karau <[email protected]>
Date: 2015-06-03T00:39:48Z
scala style check issue
commit 7e4192849efc6d282633159a15c7dd41376aa1a3
Author: Holden Karau <[email protected]>
Date: 2015-06-03T07:30:48Z
Only the weights if we need to.
commit ed351ffdf862994389b41284f95aa148c6550f41
Author: Holden Karau <[email protected]>
Date: 2015-06-03T19:39:56Z
Use appendBias for adding intercept to initial weights , fix
generateInitialWeights
commit 3ac02d72cab72b35b7cc76c50d7088d4b98bfd9d
Author: Holden Karau <[email protected]>
Date: 2015-06-08T20:20:19Z
Merge in master
commit d1ce12ba45f12d93b962ffd560242757eda739c2
Author: Holden Karau <[email protected]>
Date: 2015-07-09T20:13:21Z
Merge in master
commit 8ca0fa927bd2773ceb4ccf740445058ead706f7a
Author: Holden Karau <[email protected]>
Date: 2015-08-28T21:57:51Z
attempt to merge in master
commit 6f66f2cbc7d80335bfb0e2e5b8b430930206d06f
Author: Holden Karau <[email protected]>
Date: 2015-10-01T23:05:01Z
Merge in master (again)
commit 0cedd50368eeda594eafdb9500ed162ff33f2e25
Author: Holden Karau <[email protected]>
Date: 2015-10-02T01:44:08Z
Fix compile error after simple merge
commit 2bf289b2ab92ff9da742d22e1feda0b57f8a796c
Author: Holden Karau <[email protected]>
Date: 2015-12-30T18:41:30Z
Merge branch 'master' into
SPARK-7780-intercept-in-logisticregressionwithLBFGS-should-not-be-regularized
commit d7a26318be962eede7d6fa0792f1f4d72178dc8d
Author: Holden Karau <[email protected]>
Date: 2016-01-16T03:21:04Z
Merge in master
commit b0fe1e68bf8e7fc13cc845db90e7eb27729545d9
Author: Holden Karau <[email protected]>
Date: 2016-01-16T03:24:08Z
scala style import order fix
commit 827dcdec09414c5b25a66be359c4d651a9e18ee6
Author: Holden Karau <[email protected]>
Date: 2016-01-16T06:24:33Z
Import ordering
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]