[GitHub] [spark] mridulm edited a comment on pull request #35005: [SPARK-8582][CORE] Checkpoint eagerly when asked to do so for real

GitBox Fri, 31 Dec 2021 13:21:41 -0800


mridulm edited a comment on pull request #35005:
URL: https://github.com/apache/spark/pull/35005#issuecomment-1003445114



   > To confirm: If people do
   > 
   > ```
   > rdd.checkpoint()
   > rdd.count
   > ```
   > 
   > Spark will run the job twice? This looks like an existing bug in spark 
core. I'm fine with this PR as a workaround at the SQL side.
   
   It does not run the complete job, just the suffix required to perform the 
action.
   Typically only the last result stage (in case DAG involves shuffles) - or 
when persisted, the suffix after the persist to materialize the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm edited a comment on pull request #35005: [SPARK-8582][CORE] Checkpoint eagerly when asked to do so for real

Reply via email to