GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19208
[SPARK-21087] [ML] CrossValidator, TrainValidationSplit should preserve all
models after fitting: Scala
## What changes were proposed in this pull request?
1. We add a parameter whether to collect the full model list when
CrossValidator/TrainValidationSplit training (Default is NOT, avoid the change
cause OOM)
- Add a method in CrossValidatorModel/TrainValidationSplitModel, allow user
to get the model list
- CrossValidatorModelWriter add a âoptionâ, allow user to control
whether to persist the model list to disk.
- Note: when persisting the model list, use indices as the sub-model path
2. We add a parameter indicating whether to persist models to disk during
training (default = off).
- This will use ML persistence to dump models to a directory so they are
available later but do not consume memory.
- Note: when persisting the model list, use indices as the sub-model path
## How was this patch tested?
Test cases added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/WeichenXu123/spark expose-model-list
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19208
----
commit 46d3ab3899c196311368b3383338b3d4e6d5aeaa
Author: WeichenXu <[email protected]>
Date: 2017-09-11T13:28:53Z
init pr
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]