[GitHub] spark pull request #19805: [SQL] Adding localCheckpoint to Dataset API

felixcheung Sun, 26 Nov 2017 12:10:07 -0800

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19805#discussion_r153080509
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -524,22 +524,41 @@ class Dataset[T] private[sql](
        */
       @Experimental
       @InterfaceStability.Evolving
    -  def checkpoint(): Dataset[T] = checkpoint(eager = true)
    +  def checkpoint(eager: Boolean = true): Dataset[T] = _checkpoint(eager = 
eager)
    +
    +  /**
    +   * Locally checkpoints a Dataset and return the new Dataset. 
Checkpointing can be used to truncate
    +   * the logical plan of this Dataset, which is especially useful in 
iterative algorithms where the
    +   * plan may grow exponentially. Local checkpoints are written to 
executor storage and despite
    +   * potentially faster they are unreliable and may compromise job 
completion.
    +   *
    +   * @group basic
    --- End diff --
    
    add `@since`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19805: [SQL] Adding localCheckpoint to Dataset API

Reply via email to