Suppose I have two RDDs
val textFile = sc.textFile("/user/emp.txt")
val textFile1 = sc.textFile("/user/emp1.xt")

Later I perform a join operation on above two RDDs
val join = textFile.join(textFile1)

And there are subsequent transformations without including textFile and textFile1 further and an action to start the execution.

When action is called, textFile and textFile1 will be loaded in memory first. Later join will be performed and kept in memory. My question is once join is there memory and is used for subsequent execution, what happens to textFile and textFile1 RDDs. Are they still kept in memory untill the full lineage graph is completed or is it destroyed once its use is over? If it is kept in memory, is there any way I can explicitly remove it from memory to free the memory?

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to