Re: Spark RDD and Memory

2016-09-23 Thread Aditya
Hi Datta, Thanks for the reply. If I havent cached any rdd and the data that is being loaded into memory after performing some operations exceeds the memory, how it is handled by spark. Is previosly loaded rdds removed from memory to make it free for subsequent steps in DAG? I am running

Re: Spark RDD and Memory

2016-09-23 Thread Datta Khot
Hi Aditya, If you cache the RDDs - like textFile.cache(), textFile1().cache() - then it will not load the data again from file system. Once done with related operations it is recommended to uncache the RDDs to manage memory efficiently and avoid it's exhaustion. Note caching operation is with

Re: Spark RDD and Memory

2016-09-22 Thread Aditya
Thanks for the reply. One more question. How spark handles data if it does not fit in memory? The answer which I got is that it flushes the data to disk and handle the memory issue. Plus in below example. val textFile = sc.textFile("/user/emp.txt") val textFile1 = sc.textFile("/user/emp1.xt")

Re: Spark RDD and Memory

2016-09-22 Thread Mich Talebzadeh
Hi, unpersist works on storage memory not execution memory. So I do not think you can flush it out of memory if you have not cached it using cache or something like below in the first place. s.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY) s.unpersist I believe the recent versions

Re: Spark RDD and Memory

2016-09-22 Thread Hanumath Rao Maduri
Hello Aditya, After an intermediate action has been applied you might want to call rdd.unpersist() to let spark know that this rdd is no longer required. Thanks, -Hanu On Thu, Sep 22, 2016 at 7:54 AM, Aditya wrote: > Hi, > > Suppose I have two RDDs > val