[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

srowen Mon, 27 Oct 2014 00:51:04 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2956#discussion_r19391269
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1204,6 +1204,8 @@ abstract class RDD[T: ClassTag](
         } else if (checkpointData.isEmpty) {
           checkpointData = Some(new RDDCheckpointData(this))
           checkpointData.get.markForCheckpoint()
    +      // There is supposed to be doCheckpoint in the following, reset 
doCheckpointCalled first
    +      doCheckpointCalled = false
    --- End diff --
    
    From the docs, it's clear that this is not intended to be called after 
operations have executed on the RDD. These changes kind of hack it so it 
doesn't directly fail, but are you certain this is valid? race conditions and 
so on? What's the point of `doCheckpointCalled` after this change, really? the 
criteria seems to collapse to "allow checkpoint if no checkpoint data has been 
written". If it's that easy I do wonder why it wasn't this way in the first 
place.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

Reply via email to