Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/9072#discussion_r42068505
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1521,8 +1521,15 @@ abstract class RDD[T: ClassTag](
}
checkpointData match {
- case Some(reliable: ReliableRDDCheckpointData[_]) => logWarning(
- "RDD was already marked for reliable checkpointing: overriding
with local checkpoint.")
+ case Some(reliable: ReliableRDDCheckpointData[_]) =>
+ if (isCheckpointed) {
+ logWarning(
+ "RDD was already materialized and checkpointed: can't override
with local checkpoint.")
+ return this
+ } else {
+ logWarning(
+ "RDD was already marked for reliable checkpointing: overriding
with local checkpoint.")
+ }
--- End diff --
I think we can simplify this a little:
```
// If this RDD is already checkpointed and materialized, its lineage is
already truncated.
// In this case, we must not override our `checkpointData`, which is needed
to recover
// the checkpointed data.
if (isCheckpointedAndMaterialized) {
logWarning("Not marking RDD for local checkpoint because it was already "
+
"materialized and checkpointed")
} else {
checkpointData match {
case Some(_: ReliableRDDCheckpointData[_]) => logWarning(
"RDD was already marked for reliable checkpointing: overriding with
local checkpoint.")
case _ =>
}
checkpointData = Some(new LocalRDDCheckpointData(this))
}
this
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]