GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/17849
[SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Values from Estimator
## What changes were proposed in this pull request?
Added call to copy values of Params from Estimator to Model after fit in
PySpark ML. This will copy values for any params that are also defined in the
Model. Since currently most Models do not define the same params from the
Estimator, also added method to create new Params from looking at the Java
object if they do not exist in the Python object. This is a temporary fix that
can be removed once the PySpark models properly define the params themselves.
## How was this patch tested?
Refactored the `check_params` test to optionally check if the model params
for Python and Java match and added this check to an existing fitted model that
shares params between Estimator and Model.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
pyspark-models-own-params-SPARK-10931
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17849.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17849
----
commit a4ede3fff9c012bbd609d3fad13a8c100856f950
Author: Bryan Cutler <[email protected]>
Date: 2017-03-13T23:43:41Z
added regression test case for PySpark models not owning params
commit 3b921a469a75b243c97e93548ed3de4bfae040a9
Author: Bryan Cutler <[email protected]>
Date: 2017-03-13T23:44:22Z
fixed default PySpark param value that was being overlooked by return
instead of continue
commit dff7863d1ecd45a10c225a2ea95cc51814705ed0
Author: Bryan Cutler <[email protected]>
Date: 2017-03-14T00:29:10Z
added copy of param values to python model when estimator fit is called
commit 398ef27874e59c615af77b3c838880c68de97e35
Author: Bryan Cutler <[email protected]>
Date: 2017-03-14T21:10:03Z
Added temporary fix to add Params when fitting and persisting models
commit 1f3de13e55eb0d56f42104ceb71235da9284ef28
Author: Bryan Cutler <[email protected]>
Date: 2017-05-02T22:46:49Z
Merge remote-tracking branch 'upstream/master' into
pyspark-models-own-params-SPARK-10931
commit d621c8940421258518345eda41e775b7e65e8e8f
Author: Bryan Cutler <[email protected]>
Date: 2017-05-02T23:38:07Z
added check for NaN default param values
commit acdb4b94517f2c740fe53b80ff6870d894a885aa
Author: Bryan Cutler <[email protected]>
Date: 2017-05-03T22:32:45Z
need to create params from java when model is fit and unpersisted in order
to match
commit 9b7b886125eeb389d48ea398f6305d05b29840c9
Author: Bryan Cutler <[email protected]>
Date: 2017-05-03T22:40:34Z
removed blank line
commit 765eb5f77335232eff0889fbc7401f1e77e16dc9
Author: Bryan Cutler <[email protected]>
Date: 2017-05-03T22:55:37Z
cleaned old comment block in test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]