GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/5820
[SPARK-5956][MLLIB] Pipeline components should be copyable.
This PR added `copy(extra: ParamMap): Params` to `Params`, which makes a
copy of the current instance with a randomly generated uid and some extra param
values. With this change, we can only implement `fit` and `transform` without
extra param values given the default implementation of `fit(dataset, extra)`:
~~~scala
def fit(dataset: DataFrame, extra: ParamMap): Model = {
copy(extra).fit(dataset)
}
~~~
Inside `fit` and `transform`, since only the embedded values are used, I
added `$` as an alias for `getOrDefault` to make the code easier to read. For
example, in `LinearRegression.fit` we have:
~~~scala
val effectiveRegParam = $(regParam) / yStd
val effectiveL1RegParam = $(elasticNetParam) * effectiveRegParam
val effectiveL2RegParam = (1.0 - $(elasticNetParam)) * effectiveRegParam
~~~
Meta-algorithm like `Pipeline` implements its own `copy(extra)`. So the
fitted pipeline model stored all copied stages (no matter whether it is a
transformer or a model).
Other changes:
* `Params$.inheritValues` is moved to `Params!.copyValues` and returns the
target instance.
* `fittingParamMap` was removed because the `parent` carries this
information.
* `validate` was renamed to `validateParams` to be more precise.
TODOs:
* [ ] add tests for newly added methods
* [ ] update documentation
@jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-5956
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/5820.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5820
----
commit f082a310794dcbecf17ec3d44ab770beed84f956
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T08:30:57Z
make Params copyable and simply handling of extra params in all spark.ml
components
commit d882afc0956c59da78b5e0172f86e9c1f5432028
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T08:48:04Z
test compile
commit 9ee004e7c534db7a6ab0434eec97feca86c7daad
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T16:52:58Z
merge copy and copyWith; rename validate to validateParams
commit 53e097373e9c9707bb7486adc3c20fee384356c1
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T17:34:50Z
move inheritValues to Params and rename it to copyValues
commit 9286a228c8e141dbcd66eaf21e9cfb29bc1d344a
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T17:40:02Z
copyValues to trained models
commit 0f4fd64740763792087405f8921d4cd2bf00217a
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T18:07:38Z
fix some tests
commit c76b4d120da94350efcb4638c8a24e4fbeabb177
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T20:02:24Z
fix all unit tests
commit 5a67779b20aeb0db7c824a96f5b1cc2e7718f7c5
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T20:16:32Z
examples compile
commit b642872be33a250ef1cefbbb51c3bd8a3d37b7ba
Author: Xiangrui Meng <[email protected]>
Date: 2015-04-30T20:35:31Z
example code runs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]