Are you trying to process data as part of the same Job(till same spark context), then all you have to do is cache the output rdd of your processing. It'll run your processing once & cache the results for future tasks, unless your node caching the rdd goes down. if you are trying to retain it for quite a long time you can
- Simplistically store it as hdfs & load it each time - Either store that in a table & try to pull it with sparksql every time(experimental). - Use Ooyala Jobserver to cache the data & do all processing using that. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Mon, Jun 23, 2014 at 11:14 AM, Daedalus <tushar.nagara...@gmail.com> wrote: > Will using mapPartitions and creating a new RDD of ParsedData objects avoid > multiple parsing? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >