Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/19805#discussion_r153080488
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -524,22 +524,41 @@ class Dataset[T] private[sql](
*/
@Experimental
@InterfaceStability.Evolving
- def checkpoint(): Dataset[T] = checkpoint(eager = true)
+ def checkpoint(eager: Boolean = true): Dataset[T] = _checkpoint(eager =
eager)
+
+ /**
+ * Locally checkpoints a Dataset and return the new Dataset.
Checkpointing can be used to truncate
+ * the logical plan of this Dataset, which is especially useful in
iterative algorithms where the
+ * plan may grow exponentially. Local checkpoints are written to
executor storage and despite
+ * potentially faster they are unreliable and may compromise job
completion.
+ *
+ * @group basic
+ */
+ @Experimental
+ @InterfaceStability.Evolving
+ def localCheckpoint(eager: Boolean = true): Dataset[T] =
_checkpoint(eager = eager, local = true)
/**
* Returns a checkpointed version of this Dataset. Checkpointing can be
used to truncate the
* logical plan of this Dataset, which is especially useful in iterative
algorithms where the
- * plan may grow exponentially. It will be saved to files inside the
checkpoint
- * directory set with `SparkContext#setCheckpointDir`.
+ * plan may grow exponentially.
+ * By default reliable checkpoints are created and saved to files inside
the checkpoint
+ * directory set with `SparkContext#setCheckpointDir`. If local is set
to true a local checkpoint
+ * is performed instead. Local checkpoints are written to executor
storage and despite
+ * potentially faster they are unreliable and may compromise job
completion.
*
* @group basic
* @since 2.1.0
*/
@Experimental
@InterfaceStability.Evolving
- def checkpoint(eager: Boolean): Dataset[T] = {
+ def _checkpoint(eager: Boolean, local: Boolean = false): Dataset[T] = {
--- End diff --
I don't think this is right - this is still a public API and it's not the
convention here.
Change it to private instead
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]