You can try http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Archival_Storage_SSD__Memory . Hive tmp table use this function to speed job. https://issues.apache.org/jira/browse/HIVE-7313
r7raul1...@163.com From: Christian Date: 2015-11-06 13:50 To: Deepak Sharma CC: user Subject: Re: Spark RDD cache persistence I've never had this need and I've never done it. There are options that allow this. For example, I know there are web apps out there that work like the spark REPL. One of these I think is called Zepplin. . I've never used them, but I've seen them demoed. There is also Tachyon that Spark supports.. Hopefully, that gives you a place to start. On Thu, Nov 5, 2015 at 9:21 PM Deepak Sharma <deepakmc...@gmail.com> wrote: Thanks Christian. So is there any inbuilt mechanism in spark or api integration to other inmemory cache products such as redis to load the RDD to these system upon program exit ? What's the best approach to have long lived RDD cache ? Thanks Deepak On 6 Nov 2015 8:34 am, "Christian" <engr...@gmail.com> wrote: The cache gets cleared out when the job finishes. I am not aware of a way to keep the cache around between jobs. You could save it as an object file to disk and load it as an object file on your next job for speed. On Thu, Nov 5, 2015 at 6:17 PM Deepak Sharma <deepakmc...@gmail.com> wrote: Hi All I am confused on RDD persistence in cache . If I cache RDD , is it going to stay there in memory even if my spark program completes execution , which created it. If not , how can I guarantee that RDD is persisted in cache even after the program finishes execution. Thanks Deepak