Re: Re: Spark RDD cache persistence

r7raul1...@163.com Thu, 05 Nov 2015 21:57:48 -0800

You can try 
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Archival_Storage_SSD__Memory
 .   Hive tmp table use this function to speed job. 
https://issues.apache.org/jira/browse/HIVE-7313

r7raul1...@163.com

From: Christian
Date: 2015-11-06 13:50
To: Deepak Sharma
CC: user
Subject: Re: Spark RDD cache persistence
I've never had this need and I've never done it. There are options that allow 
this. For example, I know there are web apps out there that work like the spark 
REPL. One of these I think is called Zepplin. . I've never used them, but I've 
seen them demoed. There is also Tachyon that Spark supports.. Hopefully, that 
gives you a place to start. 
On Thu, Nov 5, 2015 at 9:21 PM Deepak Sharma <deepakmc...@gmail.com> wrote:
Thanks Christian.
So is there any inbuilt mechanism in spark or api integration  to other 
inmemory cache products such as redis to load the RDD to these system upon 
program exit ?
What's the best approach to have long lived RDD cache ?
Thanks

Deepak
On 6 Nov 2015 8:34 am, "Christian" <engr...@gmail.com> wrote:
The cache gets cleared out when the job finishes. I am not aware of a way to 
keep the cache around between jobs. You could save it as an object file to disk 
and load it as an object file on your next job for speed. 
On Thu, Nov 5, 2015 at 6:17 PM Deepak Sharma <deepakmc...@gmail.com> wrote:
Hi All
I am confused on RDD persistence in cache .
If I cache RDD , is it going to stay there in memory even if my spark program 
completes execution , which created it.
If not , how can I guarantee that RDD is persisted in cache even after the 
program finishes execution.
Thanks

Deepak

Re: Re: Spark RDD cache persistence

Reply via email to