Are you trying to process data as part of the same Job(till same spark
context), then all you have to do is cache the output rdd of your
processing. It'll run your processing once & cache the results for future
tasks, unless your node caching the rdd goes down.
if you are trying to retain it for quite a long time you can

   - Simplistically store it as hdfs & load it each time
   - Either store that in a table & try to pull it with sparksql every
   time(experimental).
   - Use Ooyala Jobserver to cache the data & do all processing using that.

Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Mon, Jun 23, 2014 at 11:14 AM, Daedalus <tushar.nagara...@gmail.com>
wrote:

> Will using mapPartitions and creating a new RDD of ParsedData objects avoid
> multiple parsing?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104p8107.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to