GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/11983

    [SPARK-14100][ML] Merging Estimator and Model: prototype for StringIndexer

    ## What changes were proposed in this pull request?
    
    This is a *prototype*.  It will be used to decide whether or not to proceed 
with [https://issues.apache.org/jira/browse/SPARK-14100].
    
    Main changes
    * Created new abstraction MutableEstimator which will eventually replace 
Estimator for Spark 2.0.
      * MutableEstimator inherits from Transformer, and it contains method 
```fit()```.
      * It does not contain fit() methods taking ParamMaps.  The expected 
behavior of such methods becomes more ambiguous since it is unclear if they 
modify the current instance.
    * Merged StringIndexer and StringIndexerModel, where the merged abstraction 
now inherits from MutableEstimator.
    * Did the same for the Python API.  Also added JavaMutableEstimator for 
Python wrappers.
    
    Other required changes
    * Modified Pipeline to handle MutableEstimator.
    
    Other proposed changes
    * Deprecated transform() methods in Transformer taking Param settings.
    * Added ```copy()``` without arguments to PipelineStage since this will be 
a more common operation after the merge.
    * Added more ```set()``` methods to Params to facilitate setting Param 
values, now that fit() and transform() methods taking ParamMaps are going to be 
removed.
    
    ## How was this patch tested?
    
    Existing unit tests.  Note that the required changes were minimal.
    
    More changes to meta-algorithms such as CrossValidator may be needed as we 
merge other Estimator-Model pairs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark thunterdb-14100

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11983.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11983
    
----
commit 0585e3fbc6dc7316ae2d8dd425648c9f6b45e041
Author: Timothy Hunter <[email protected]>
Date:   2016-03-15T21:54:20Z

    passing tests

commit 6aa439bb78aea37476e7a12209a9f902a7be9871
Author: Timothy Hunter <[email protected]>
Date:   2016-03-15T21:56:36Z

    cleanups

commit 317df204c049a08c3e230c4d3ca61ea6f122c864
Author: Timothy Hunter <[email protected]>
Date:   2016-03-23T21:01:59Z

    wokr

commit bc0616605091f77d6c9621fc55f5d3561ba5a05d
Author: Joseph K. Bradley <[email protected]>
Date:   2016-03-26T22:24:28Z

    Made StringIndexer extend MutableEstimator in Python and Scala.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to