GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/12166

    [SPARK-13048][ML][MLLIB] deleteLastCheckpoint option for LDA EM optimizer

    ## What changes were proposed in this pull request?
    
    The EMLDAOptimizer should generally not delete its last checkpoint since 
that can cause failures when DistributedLDAModel methods are called (if any 
partitions need to be recovered from the checkpoint).
    
    This PR adds a "deleteLastCheckpoint" option which defaults to false.  This 
is a change in behavior from Spark 1.6, in that the last checkpoint will not be 
removed by default.
    
    This involves adding the deleteLastCheckpoint option to both spark.ml and 
spark.mllib, and modifying PeriodicCheckpointer to support the option.
    
    This also:
    * Makes MLlibTestSparkContext extend TempDirectory and set the 
checkpointDir to tempDir
    * Updates LibSVMRelationSuite because of a name conflict with "tempDir" 
(and fixes a bug where it failed to delete a temp directory)
    
    ## How was this patch tested?
    
    Added 2 new unit tests to spark.ml LDASuite, which calls into spark.mllib.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark emlda-save-checkpoint

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12166
    
----
commit 5d1c89d1f1732266cbcc00f709a81fc06917ae73
Author: Joseph K. Bradley <[email protected]>
Date:   2016-04-05T01:34:35Z

    Added deleteLastCheckpoint option to LDA, defaulting to false

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to