GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/17849

    [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Values from Estimator

    ## What changes were proposed in this pull request?
    
    Added call to copy values of Params from Estimator to Model after fit in 
PySpark ML.  This will copy values for any params that are also defined in the 
Model.  Since currently most Models do not define the same params from the 
Estimator, also added method to create new Params from looking at the Java 
object if they do not exist in the Python object.  This is a temporary fix that 
can be removed once the PySpark models properly define the params themselves.
    
    ## How was this patch tested?
    
    Refactored the `check_params` test to optionally check if the model params 
for Python and Java match and added this check to an existing fitted model that 
shares params between Estimator and Model.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark 
pyspark-models-own-params-SPARK-10931

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17849
    
----
commit a4ede3fff9c012bbd609d3fad13a8c100856f950
Author: Bryan Cutler <[email protected]>
Date:   2017-03-13T23:43:41Z

    added regression test case for PySpark models not owning params

commit 3b921a469a75b243c97e93548ed3de4bfae040a9
Author: Bryan Cutler <[email protected]>
Date:   2017-03-13T23:44:22Z

    fixed default PySpark param value that was being overlooked by return 
instead of continue

commit dff7863d1ecd45a10c225a2ea95cc51814705ed0
Author: Bryan Cutler <[email protected]>
Date:   2017-03-14T00:29:10Z

    added copy of param values to python model when estimator fit is called

commit 398ef27874e59c615af77b3c838880c68de97e35
Author: Bryan Cutler <[email protected]>
Date:   2017-03-14T21:10:03Z

    Added temporary fix to add Params when fitting and persisting models

commit 1f3de13e55eb0d56f42104ceb71235da9284ef28
Author: Bryan Cutler <[email protected]>
Date:   2017-05-02T22:46:49Z

    Merge remote-tracking branch 'upstream/master' into 
pyspark-models-own-params-SPARK-10931

commit d621c8940421258518345eda41e775b7e65e8e8f
Author: Bryan Cutler <[email protected]>
Date:   2017-05-02T23:38:07Z

    added check for NaN default param values

commit acdb4b94517f2c740fe53b80ff6870d894a885aa
Author: Bryan Cutler <[email protected]>
Date:   2017-05-03T22:32:45Z

    need to create params from java when model is fit and unpersisted in order 
to match

commit 9b7b886125eeb389d48ea398f6305d05b29840c9
Author: Bryan Cutler <[email protected]>
Date:   2017-05-03T22:40:34Z

    removed blank line

commit 765eb5f77335232eff0889fbc7401f1e77e16dc9
Author: Bryan Cutler <[email protected]>
Date:   2017-05-03T22:55:37Z

    cleaned old comment block in test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to