I've noticed that when unpersisting an "upstream" Dataset, then the "downstream" Dataset is also unpersisted. I did not expect this behavior, and I've noticed that RDDs do not have this behavior.
Below I've pasted a simple reproducible case. There are two datasets, x and y, where y is created by applying a transformation on x. Both are cached and materialized (can confirm in the UI Storage tab). Then x is unpersisted, which as expected removes it from the cache. However, y is also unpersisted which I didn't expect. I tried this same scenario with RDDs instead and saw that y was left in the cache as expected. Is this a bug, or the expected behavior for Datasets? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org