Github user liyezhang556520 commented on a diff in the pull request:
https://github.com/apache/spark/pull/2956#discussion_r19397687
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1204,6 +1204,8 @@ abstract class RDD[T: ClassTag](
} else if (checkpointData.isEmpty) {
checkpointData = Some(new RDDCheckpointData(this))
checkpointData.get.markForCheckpoint()
+ // There is supposed to be doCheckpoint in the following, reset
doCheckpointCalled first
+ doCheckpointCalled = false
--- End diff --
Hi @srowen , thanks for your comment, the change is a little kind of hack.
For `doCheckpointCalled`, it still keeps the point before this change. I'm just
considering it might be a little wired for user that checkpoint would never
work after the first time the job has been executed on the RDD. While cache()
doesn't have such issue. Maybe, this is under concern of automatic checkpoint.
Anyway, it would be better if this can be solved.
@pwendell , can you share you expertise on the original design and your
opinion on this? Since this is not a bug, it is only a convention for users on
how to do checkpoint in spark. If the situation I listed in this JIRA is not
considered to support in spark, I will close the JIRA and this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]