GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/3637
[SPARK-4789] [mllib] Standardize ML Prediction APIs
This is part (1) of the updates from the WIP PR in
[https://github.com/apache/spark/pull/3427]
Abstract classes for learning algorithms:
* Classifier
* Regressor
* Predictor
Traits for learning algorithms
* ProbabilisticClassificationModel
Concrete classes: learning algorithms
* LinearRegression
* LogisticRegression (updated to use new abstract classes)
Concrete classes: other
* LabeledPoint (adding weight to the old LabeledPoint)
Other updates:
* Modified ParamMap to sort parameters in toString
Test Suites:
* LabeledPointSuite
* LinearRegressionSuite
* LogisticRegressionSuite
* + Java versions of above suites
CC: @mengxr @etrain @shivaram
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark ml-api-part1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3637.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3637
----
commit de1e3b4c39b42757e56345a6bab2bdeefaa3ca25
Author: Joseph K. Bradley <[email protected]>
Date: 2014-11-24T07:18:52Z
Added lots of classes for new ML API:
Abstract classes for learning algorithms:
* Classifier
* Regressor
* Predictor
Traits for learning algorithms
* HasDefaultEstimator
* IterativeEstimator
* IterativeSolver
* ProbabilisticClassificationModel
* WeakLearner
Concrete classes: learning algorithms
* AdaBoost (partly implemented)
* NaiveBayes (rough implementation)
* LinearRegression
* LogisticRegression (updated to use new abstract classes)
Concrete classes: evaluation
* ClassificationEvaluator
* RegressionEvaluator
* PredictionEvaluator
Concrete classes: other
* LabeledPoint (adding weight to the old LabeledPoint)
commit 6551244b96d8f70f1daacd0415318cf81fd5111a
Author: Joseph K. Bradley <[email protected]>
Date: 2014-11-24T07:30:31Z
fixed compilation issues, but have not added tests yet
commit 25b643d4b367fea5a3ba1b91564374c2b1b7a0f1
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-01T18:31:41Z
removing everything except for simple class hierarchy for classification
commit e61e2738dcb2494be25cec2bd798c3e6e5156b73
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-04T21:37:29Z
Added LinearRegression and Regressor back from ml-api branch
commit 272e62fb41fc8778f3a13f812d4262d9558a772b
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-05T00:11:02Z
Modified ParamMap to sort parameters in toString. Cleaned up classes in
class hierarchy, before implementing tests and examples.
commit cc13d61f2a277b101f7422af240afa64dfb10236
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-05T01:11:22Z
Fixed bug from last commit (sorting paramMap by parameter names in
toString). Fixed bug in persisting logreg data. Added threshold_internal to
logreg for faster test-time prediction (avoiding map lookup).
commit 09fb85fb7502a64a661c5f8ae4c941971ff861c8
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-05T18:22:10Z
Fixed issue with logreg threshold being set correctly
commit a0faf022792524c5a33a20d7cb591a91a7ac160b
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-05T18:43:14Z
Updated docs. Added LabeledPointSuite to spark.ml
commit 3e961cb6616906940fd646639f818c58d29c04f6
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-05T23:15:48Z
* Changed semantics of Predictor.train() to merge the given paramMap with
the embedded paramMap.
* remove threshold_internal from logreg
* Added Predictor.copy()
* Extended LogisticRegressionSuite
commit 8922966757e7b5d7588613f5dfc11cee267de1b4
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-06T01:32:14Z
added train() to Predictor subclasses which does not take a ParamMap.
commit 0c45756e3614c027d662d70dfa11d736690dc837
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-06T03:57:12Z
* fixed LinearRegression train() to use embedded paramMap
* added Predictor.predict(RDD[Vector]) method
* updated Linear/LogisticRegressionSuites
commit 6be36c16484478bdb9d847fd343d6b7319759b21
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-06T06:18:30Z
Added JavaLabeledPointSuite.java for spark.ml, and added constructor to
LabeledPoint which defaults weight to 1.0
commit d8eaf7099a9be6157f90b11f82917ca5b604e1bd
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-08T19:09:03Z
Added methods:
* Classifier: batch predictRaw()
* Predictor: train() without paramMap
ProbabilisticClassificationModel.predictProbabilities()
* Java versions of all above batch methods + others
Updated LogisticRegressionSuite.
Updated JavaLogisticRegressionSuite to match LogisticRegressionSuite.
commit 1e46094fbf2534ff022cb843a811b3fbd7fb9d64
Author: Joseph K. Bradley <[email protected]>
Date: 2014-12-08T19:51:55Z
Added spark.ml LinearRegressionSuite
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]