I've noticed that when unpersisting an "upstream" Dataset, then the
"downstream" Dataset is also unpersisted. I did not expect this behavior,
and I've noticed that RDDs do not have this behavior.

Below I've pasted a simple reproducible case. There are two datasets, x and
y, where y is created by applying a transformation on x. Both are cached and
materialized (can confirm in the UI Storage tab). Then x is unpersisted,
which as expected removes it from the cache. However, y is also unpersisted
which I didn't expect. I tried this same scenario with RDDs instead and saw
that y was left in the cache as expected.

Is this a bug, or the expected behavior for Datasets?

Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to