[ https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-21478. ----------------------------- Resolution: Not A Problem The current cache design requires the query correctness. If you want to keep the intermediate data, even if the data is stale. You need to materialize it by saving it as a table. Thanks for reporting it. We might need to clarify it in the document. > Unpersist a DF also unpersists related DFs > ------------------------------------------ > > Key: SPARK-21478 > URL: https://issues.apache.org/jira/browse/SPARK-21478 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: Roberto Mirizzi > > Starting with Spark 2.1.1 I observed this bug. Here's are the steps to > reproduce it: > # create a DF > # persist it > # count the items in it > # create a new DF as a transformation of the previous one > # persist it > # count the items in it > # unpersist the first DF > Once you do that you will see that also the 2nd DF is gone. > The code to reproduce it is: > {code:java} > val x1 = Seq(1).toDF() > x1.persist() > x1.count() > assert(x1.storageLevel.useMemory) > val x11 = x1.select($"value" * 2) > x11.persist() > x11.count() > assert(x11.storageLevel.useMemory) > x1.unpersist() > assert(!x1.storageLevel.useMemory) > //the following assertion FAILS > assert(x11.storageLevel.useMemory) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org