...@clearstorydata.com]
*Sent:* Thursday, March 26, 2015 12:37 PM
*To:* Sean Owen
*Cc:* Wang, Ningjun (LNG-NPV); user@spark.apache.org
*Subject:* Re: How to get rdd count() without double evaluation of the
RDD?
You can also always take the more extreme approach of using
SparkContext#runJob (or submitJob
: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Thursday, March 26, 2015 12:37 PM
To: Sean Owen
Cc: Wang, Ningjun (LNG-NPV); user@spark.apache.org
Subject: Re: How to get rdd count() without double evaluation of the RDD?
You can also always take the more extreme approach of using SparkContext
You can also always take the more extreme approach of using
SparkContext#runJob (or submitJob) to write a custom Action that does what
you want in one pass. Usually that's not worth the extra effort.
On Thu, Mar 26, 2015 at 9:27 AM, Sean Owen so...@cloudera.com wrote:
To avoid computing twice
To avoid computing twice you need to persist the RDD but that need not be
in memory. You can persist to disk with persist().
On Mar 26, 2015 4:11 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.com wrote:
I have a rdd that is expensive to compute. I want to save it as object
file and also