juliuszsompolski commented on code in PR #48324:
URL: https://github.com/apache/spark/pull/48324#discussion_r1792197667


##########
sql/api/src/main/scala/org/apache/spark/sql/api/Dataset.scala:
##########
@@ -363,7 +365,29 @@ abstract class Dataset[T] extends Serializable {
    * @since 2.3.0
    */
   def localCheckpoint(eager: Boolean): Dataset[T] =
-    checkpoint(eager = eager, reliableCheckpoint = false)
+    checkpoint(eager = eager, reliableCheckpoint = false, storageLevel = None)
+
+  /**
+   * Locally checkpoints a Dataset and return the new Dataset. Checkpointing 
can be used to
+   * truncate the logical plan of this Dataset, which is especially useful in 
iterative algorithms
+   * where the plan may grow exponentially. Local checkpoints are written to 
executor storage and
+   * despite potentially faster they are unreliable and may compromise job 
completion.
+   *
+   * @param eager
+   *   Whether to checkpoint this dataframe immediately
+   * @param storageLevel
+   *   Option. If defined, StorageLevel with which to checkpoint the data.
+   * @note
+   *   When checkpoint is used with eager = false, the final data that is 
checkpointed after the
+   *   first action may be different from the data that was used during the 
job due to
+   *   non-determinism of the underlying operation and retries. If checkpoint 
is used to achieve
+   *   saving a deterministic snapshot of the data, eager = true should be 
used. Otherwise, it is
+   *   only deterministic after the first execution, after the checkpoint was 
finalized.
+   * @group basic
+   * @since 4.0.0
+   */
+  def localCheckpoint(eager: Boolean, storageLevel: Option[StorageLevel]): 
Dataset[T] =

Review Comment:
   I suppose that Option is also very bad for Java API compatibility... so no 
Option definitely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to