GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/7875
[SPARK-8601][ML] Add an option to disable standardization for linear
regression
All compressed sensing applications, and some of the regression use-cases
will have better result by turning the feature scaling off. However, if we
implement this naively by training the dataset without doing any
standardization, the rate of convergency will not be good. This can be
implemented by still standardizing the training dataset but we penalize each
component differently to get effectively the same objective function but a
better numerical problem. As a result, for those columns with high variances,
they will be penalized less, and vice versa. Without this, since all the
features are standardized, so they will be penalized the same.
In R, there is an option for this.
standardize
Logical flag for x variable standardization, prior to fitting the model
sequence. The coefficients are always returned on the original scale. Default
is standardize=TRUE. If variables are in the same units already, you might not
wish to standardize. See details below for y standardization with
family="gaussian".
Note that the primary author for this PR is @holdenk
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dbtsai/spark SPARK-8522
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7875.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7875
----
commit 00a1dc5c7550bccd481b264451251bcb5dbde4e6
Author: Holden Karau <[email protected]>
Date: 2015-06-25T01:31:08Z
Add the param to the linearregression impl
commit 55d3a66857220631244bd0a5001ab8c6a864b7c0
Author: Holden Karau <[email protected]>
Date: 2015-06-26T04:02:19Z
Add standardization param for linear regression
commit e47c57475819496fdd7cbcda9ddb7d7c0f58f538
Author: Holden Karau <[email protected]>
Date: 2015-06-26T05:27:30Z
Add support for L2 without standardization.
commit e54a8a98e1dc0b16a4f03fd7ad0da92f8b6b66aa
Author: Holden Karau <[email protected]>
Date: 2015-06-26T19:13:23Z
Fix long line
commit 99ce053603aab8c379b372d00d8b7b586c655de3
Author: Holden Karau <[email protected]>
Date: 2015-06-30T18:16:22Z
merge in master
commit 0c334a256cb8004c968e9a3f9360c34d33e39a8f
Author: Holden Karau <[email protected]>
Date: 2015-06-30T18:18:33Z
Remove extra line
commit b83a41e13d87864c866e996eb95e74454256cfce
Author: Holden Karau <[email protected]>
Date: 2015-06-30T19:11:41Z
Expand the tests and make them similar to the other PR also providing an
option to disable standardization (but for LoR).
commit 3f929358579da340010e2d5f8a86eaf4a1f9a994
Author: Holden Karau <[email protected]>
Date: 2015-07-10T00:04:42Z
merge
commit eebe10a8c1eb9da6ab313c0deb38207e3c2f5fa6
Author: Holden Karau <[email protected]>
Date: 2015-07-10T00:07:09Z
Use same comparision operator throughout the test
commit 332f14027ce5f81774a4f3b02b808dad2e1edc75
Author: Holden Karau <[email protected]>
Date: 2015-07-21T06:30:59Z
Merge in master
commit 6b1dc09c20cb6588e3eff2ba036d2649b8b81d8d
Author: Holden Karau <[email protected]>
Date: 2015-07-21T07:24:32Z
Merge branch 'master' into
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
commit d6234ba61e020dc9c3ff314772cb3dd98c1be5dd
Author: DB Tsai <[email protected]>
Date: 2015-08-02T21:53:47Z
Merge branch 'master' into
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]