GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/11983
[SPARK-14100][ML] Merging Estimator and Model: prototype for StringIndexer
## What changes were proposed in this pull request?
This is a *prototype*. It will be used to decide whether or not to proceed
with [https://issues.apache.org/jira/browse/SPARK-14100].
Main changes
* Created new abstraction MutableEstimator which will eventually replace
Estimator for Spark 2.0.
* MutableEstimator inherits from Transformer, and it contains method
```fit()```.
* It does not contain fit() methods taking ParamMaps. The expected
behavior of such methods becomes more ambiguous since it is unclear if they
modify the current instance.
* Merged StringIndexer and StringIndexerModel, where the merged abstraction
now inherits from MutableEstimator.
* Did the same for the Python API. Also added JavaMutableEstimator for
Python wrappers.
Other required changes
* Modified Pipeline to handle MutableEstimator.
Other proposed changes
* Deprecated transform() methods in Transformer taking Param settings.
* Added ```copy()``` without arguments to PipelineStage since this will be
a more common operation after the merge.
* Added more ```set()``` methods to Params to facilitate setting Param
values, now that fit() and transform() methods taking ParamMaps are going to be
removed.
## How was this patch tested?
Existing unit tests. Note that the required changes were minimal.
More changes to meta-algorithms such as CrossValidator may be needed as we
merge other Estimator-Model pairs.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark thunterdb-14100
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11983.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11983
----
commit 0585e3fbc6dc7316ae2d8dd425648c9f6b45e041
Author: Timothy Hunter <[email protected]>
Date: 2016-03-15T21:54:20Z
passing tests
commit 6aa439bb78aea37476e7a12209a9f902a7be9871
Author: Timothy Hunter <[email protected]>
Date: 2016-03-15T21:56:36Z
cleanups
commit 317df204c049a08c3e230c4d3ca61ea6f122c864
Author: Timothy Hunter <[email protected]>
Date: 2016-03-23T21:01:59Z
wokr
commit bc0616605091f77d6c9621fc55f5d3561ba5a05d
Author: Joseph K. Bradley <[email protected]>
Date: 2016-03-26T22:24:28Z
Made StringIndexer extend MutableEstimator in Python and Scala.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]