Gurus, I understand when we create RDD in Spark it is immutable. So I have few points please: - When RDD is created that is just a pointer. Not most Spark operations it is lazy not consumed until a collection operation done that affects RDD? - When a DF is created from RDD does that result in additional memory to DF. Again with collection operation that affects both RDD and DF built from that RDD? - There is some references that as you build operations and creating new DFs, one is consuming more and more memory without releasing it back? - What will happen if I do df.unpersist. I know that it shifts DF from memory (cache) to disk. Will that reduce memory overhead? - Is it a good idea to unpersist to reduce memory overhead?
Thanking you