Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/10934#discussion_r51329576
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](
private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None
+ // Whether checkpoint all RDDs that are marked with the checkpoint flag.
--- End diff --
We need to expand on this comment:
```
// Whether to checkpoint all RDDs that are marked for checkpointing. By
default, we stop
// as soon as we find the first such RDD. This optimization allows us to
write less data
// but is not safe for all workloads. E.g. in streaming we may checkpoint
both an RDD
// and its parent every batch, in which case the parent may never be
checkpointed
// and its lineage never truncated (SPARK-6847).
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]