GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/11581
[SPARK-11888] [ML] Decision tree persistence in spark.ml
### What changes were proposed in this pull request?
Made these MLReadable and MLWritable: DecisionTreeClassifier,
DecisionTreeClassificationModel, DecisionTreeRegressor,
DecisionTreeRegressionModel
* The shared implementation is in treeModels.scala
* I use case classes to create a DataFrame to save, and I use the Dataset
API to parse loaded files.
Other changes:
* Made CategoricalSplit.numCategories public (to use in persistence)
* Fixed a bug in DefaultReadWriteTest.testEstimatorAndModelReadWrite, where
it did not call the checkModelData function passed as an argument. This caused
an error in LDASuite, which I fixed.
### How was this patch tested?
Persistence is tested via unit tests. For each algorithm, there are 2
non-trivial trees (depth 2). One is built with continuous features, and one
with categorical; this ensures that both types of splits are tested.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark dt-io
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11581.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11581
----
commit 1eb8f4118d1dbeba9fac2e00adb20b468f7ec668
Author: Joseph K. Bradley <[email protected]>
Date: 2015-11-20T19:22:23Z
partly done adding save/load to DecisionTreeClassifier and Model
commit 24a418873702042b5456423dd77d7672abc8aa7b
Author: Joseph K. Bradley <[email protected]>
Date: 2016-03-08T06:41:25Z
DecisionTreeClassifier,Regressor and Models support save,load. Fixed bug
in DefaultReadWriteTest.testEstimatorAndModelReadWrite where it never called
checkModelData function.
commit ea1e02ff69494e7eed6742c5234432610d4b5ede
Author: Joseph K. Bradley <[email protected]>
Date: 2016-03-08T08:03:43Z
Fixed issue in LDA not copying doc,topicConcentration values
commit 7a1013a3cfc7c4a195cea714c4612e9d3e946b4d
Author: Joseph K. Bradley <[email protected]>
Date: 2016-03-08T18:04:10Z
Reverted annoying style mistakes from IntelliJ. Fixed my fix to LDASuite.
Some more docs.
commit 207cec29f4017c1c4d4a12fb382ed3a604f60303
Author: Joseph K. Bradley <[email protected]>
Date: 2016-03-08T18:36:59Z
tiny cleanups
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]