GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/6019
[WIP][SPARK-7407][MLLIB] use uid + name to identify parameters
A param instance is strongly attached to an parent in the current
implementation. So if we make a copy of an estimator or a transformer in
pipelines and other meta-algorithms, it becomes error-prone to copy the params
to the copied instances. In this PR, a param is identified by its parent's UID
and the param name. So it becomes loosely attached to its parent and all its
derivatives. The UID is preserved during copying or fitting. All components now
have a default constructor and a constructor that takes a UID as input. I keep
the constructors for Param in this PR to reduce the amount of diff and moved
`parent` as a mutable field. @jkbradley
This PR still needs some clean-ups, and there are several spark.ml PRs
pending. I'll try to get them merged first and then update this PR.
@jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-7407
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/6019.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6019
----
commit eaeed35ba0ee22132afd88e51d8808bb8defb122
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-07T04:08:23Z
update Identifiable
commit 8726d39d3a6ff3a4285b80661b2cf4c48b830508
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-07T04:39:40Z
use parent uid in Param
commit 108937eb5501801387137b15ec8d7003d4d717b5
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-07T20:16:01Z
pass compile
commit fbc39f04dd44897e320cc283b0a0cfa9376f2494
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-07T22:34:02Z
pass test:compile
commit e1160cfceb249db8071181620871a25f7a910a91
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-07T22:45:11Z
fix tests
commit 818e1db0375f3230eec2fcc231cede7f2bb8f13d
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-08T21:08:11Z
merge master
commit c255f17ee3e9e26f751973b6113dc91cfc94defd
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-08T21:34:37Z
fix tests in ParamsSuite
commit fdbc415bb9e2306df37c215d587ac57f8418b791
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-08T21:59:56Z
all tests passed
commit a4794dd842f82b001daef73dd82016766b6215b9
Author: Xiangrui Meng <[email protected]>
Date: 2015-05-08T22:13:03Z
change Param to use to reduce the size of diff
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]