GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/11581

    [SPARK-11888] [ML] Decision tree persistence in spark.ml

    ### What changes were proposed in this pull request?
    
    Made these MLReadable and MLWritable: DecisionTreeClassifier, 
DecisionTreeClassificationModel, DecisionTreeRegressor, 
DecisionTreeRegressionModel
    * The shared implementation is in treeModels.scala
    * I use case classes to create a DataFrame to save, and I use the Dataset 
API to parse loaded files.
    
    Other changes:
    * Made CategoricalSplit.numCategories public (to use in persistence)
    * Fixed a bug in DefaultReadWriteTest.testEstimatorAndModelReadWrite, where 
it did not call the checkModelData function passed as an argument.  This caused 
an error in LDASuite, which I fixed.
    
    ### How was this patch tested?
    
    Persistence is tested via unit tests.  For each algorithm, there are 2 
non-trivial trees (depth 2).  One is built with continuous features, and one 
with categorical; this ensures that both types of splits are tested.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark dt-io

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11581.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11581
    
----
commit 1eb8f4118d1dbeba9fac2e00adb20b468f7ec668
Author: Joseph K. Bradley <[email protected]>
Date:   2015-11-20T19:22:23Z

    partly done adding save/load to DecisionTreeClassifier and Model

commit 24a418873702042b5456423dd77d7672abc8aa7b
Author: Joseph K. Bradley <[email protected]>
Date:   2016-03-08T06:41:25Z

    DecisionTreeClassifier,Regressor and Models support save,load.  Fixed bug 
in DefaultReadWriteTest.testEstimatorAndModelReadWrite where it never called 
checkModelData function.

commit ea1e02ff69494e7eed6742c5234432610d4b5ede
Author: Joseph K. Bradley <[email protected]>
Date:   2016-03-08T08:03:43Z

    Fixed issue in LDA not copying doc,topicConcentration values

commit 7a1013a3cfc7c4a195cea714c4612e9d3e946b4d
Author: Joseph K. Bradley <[email protected]>
Date:   2016-03-08T18:04:10Z

    Reverted annoying style mistakes from IntelliJ.  Fixed my fix to LDASuite.  
Some more docs.

commit 207cec29f4017c1c4d4a12fb382ed3a604f60303
Author: Joseph K. Bradley <[email protected]>
Date:   2016-03-08T18:36:59Z

    tiny cleanups

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to