RE: What's the benifit of RDD checkpoint against RDD save

2016-03-24 Thread Sun, Rui
x27;s the benifit of RDD checkpoint against RDD save Thanks, Mark. Since checkpoint may get cleaned up later on, it seems option #2 (saveXXX) is viable. On Wed, Mar 23, 2016 at 8:01 PM, Mark Hamstra mailto:m...@clearstorydata.com>> wrote: Yes, the terminology is being used sloppily/non-st

Re: What's the benifit of RDD checkpoint against RDD save

2016-03-24 Thread Ted Yu
Thanks, Mark. Since checkpoint may get cleaned up later on, it seems option #2 (saveXXX) is viable. On Wed, Mar 23, 2016 at 8:01 PM, Mark Hamstra wrote: > Yes, the terminology is being used sloppily/non-standardly in this thread > -- "the last RDD" after a series of transformation is the RDD at

Re: What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Mark Hamstra
Yes, the terminology is being used sloppily/non-standardly in this thread -- "the last RDD" after a series of transformation is the RDD at the beginning of the chain, just now with an attached chain of "to be done" transformations when an action is eventually run. If the saveXXX action is the only

Re: What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Ted Yu
bq. when I get the last RDD If I read Todd's first email correctly, the computation has been done. I could be wrong. On Wed, Mar 23, 2016 at 7:34 PM, Mark Hamstra wrote: > Neither of you is making any sense to me. If you just have an RDD for > which you have specified a series of transformation

Re: What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Mark Hamstra
Neither of you is making any sense to me. If you just have an RDD for which you have specified a series of transformations but you haven't run any actions, then neither checkpointing nor saving makes sense -- you haven't computed anything yet, you've only written out the recipe for how the computa

Re: What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Ted Yu
See the doc for checkpoint: * Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint * directory set with `SparkContext#setCheckpointDir` and all references to its parent * RDDs will be removed. *This function must be called before any job has been* * * execut