RDD and DataFrame persistent memory usage

Ashok Kumar Sun, 25 Jun 2017 00:14:22 -0700

 Gurus,
I understand when we create RDD in Spark it is immutable.
So I have few points please:
   
   - When RDD is created that is just a pointer. Not most Spark operations it 
is lazy not consumed until a collection operation done that affects RDD?
   - When a DF is created from RDD does that result in additional memory to DF. 
Again with collection operation that affects both RDD and DF built from that 
RDD?
   - There is some references that as you build operations and creating new 
DFs, one is consuming more and more memory without releasing it back?
   - What will happen if I do df.unpersist. I know that it shifts DF from 
memory (cache) to disk. Will that reduce memory overhead?
   - Is it a good idea to unpersist to reduce memory overhead?



Thanking you

RDD and DataFrame persistent memory usage

Reply via email to