Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19805#discussion_r153080488
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -524,22 +524,41 @@ class Dataset[T] private[sql](
        */
       @Experimental
       @InterfaceStability.Evolving
    -  def checkpoint(): Dataset[T] = checkpoint(eager = true)
    +  def checkpoint(eager: Boolean = true): Dataset[T] = _checkpoint(eager = 
eager)
    +
    +  /**
    +   * Locally checkpoints a Dataset and return the new Dataset. 
Checkpointing can be used to truncate
    +   * the logical plan of this Dataset, which is especially useful in 
iterative algorithms where the
    +   * plan may grow exponentially. Local checkpoints are written to 
executor storage and despite
    +   * potentially faster they are unreliable and may compromise job 
completion.
    +   *
    +   * @group basic
    +   */
    +  @Experimental
    +  @InterfaceStability.Evolving
    +  def localCheckpoint(eager: Boolean = true): Dataset[T] = 
_checkpoint(eager = eager, local = true)
     
       /**
        * Returns a checkpointed version of this Dataset. Checkpointing can be 
used to truncate the
        * logical plan of this Dataset, which is especially useful in iterative 
algorithms where the
    -   * plan may grow exponentially. It will be saved to files inside the 
checkpoint
    -   * directory set with `SparkContext#setCheckpointDir`.
    +   * plan may grow exponentially.
    +   * By default reliable checkpoints are created and saved to files inside 
the checkpoint
    +   * directory set with `SparkContext#setCheckpointDir`. If local is set 
to true a local checkpoint
    +   * is performed instead. Local checkpoints are written to executor 
storage and despite
    +   * potentially faster they are unreliable and may compromise job 
completion.
        *
        * @group basic
        * @since 2.1.0
        */
       @Experimental
       @InterfaceStability.Evolving
    -  def checkpoint(eager: Boolean): Dataset[T] = {
    +  def _checkpoint(eager: Boolean, local: Boolean = false): Dataset[T] = {
    --- End diff --
    
    I don't think this is right - this is still a public API and it's not the 
convention here.
    Change it to private instead


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to