GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/12166
[SPARK-13048][ML][MLLIB] deleteLastCheckpoint option for LDA EM optimizer
## What changes were proposed in this pull request?
The EMLDAOptimizer should generally not delete its last checkpoint since
that can cause failures when DistributedLDAModel methods are called (if any
partitions need to be recovered from the checkpoint).
This PR adds a "deleteLastCheckpoint" option which defaults to false. This
is a change in behavior from Spark 1.6, in that the last checkpoint will not be
removed by default.
This involves adding the deleteLastCheckpoint option to both spark.ml and
spark.mllib, and modifying PeriodicCheckpointer to support the option.
This also:
* Makes MLlibTestSparkContext extend TempDirectory and set the
checkpointDir to tempDir
* Updates LibSVMRelationSuite because of a name conflict with "tempDir"
(and fixes a bug where it failed to delete a temp directory)
## How was this patch tested?
Added 2 new unit tests to spark.ml LDASuite, which calls into spark.mllib.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark emlda-save-checkpoint
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12166.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12166
----
commit 5d1c89d1f1732266cbcc00f709a81fc06917ae73
Author: Joseph K. Bradley <[email protected]>
Date: 2016-04-05T01:34:35Z
Added deleteLastCheckpoint option to LDA, defaulting to false
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]