[ 
https://issues.apache.org/jira/browse/SPARK-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-24596.
-----------------------------
          Resolution: Fixed
    Target Version/s: 2.4.0

> Non-cascading Cache Invalidation
> --------------------------------
>
>                 Key: SPARK-24596
>                 URL: https://issues.apache.org/jira/browse/SPARK-24596
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Major
>             Fix For: 2.4.0
>
>
> When invalidating a cache, we invalid other caches dependent on this cache to 
> ensure cached data is up to date. For example, when the underlying table has 
> been modified or the table has been dropped itself, all caches that use this 
> table should be invalidated or refreshed.
> However, in other cases, like when user simply want to drop a cache to free 
> up memory, we do not need to invalidate dependent caches since no underlying 
> data has been changed. For this reason, we would like to introduce a new 
> cache invalidation mode: the non-cascading cache invalidation. And we choose 
> between the existing mode and the new mode for different cache invalidation 
> scenarios:
>  # Drop tables and regular (persistent) views: regular mode
>  # Drop temporary views: non-cascading mode
>  # Modify table contents (INSERT/UPDATE/MERGE/DELETE): regular mode
>  # Call {{DataSet.unpersist()}}: non-cascading mode
>  # Call {{Catalog.uncacheTable()}}: follow the same convention as drop 
> tables/view, which is, use non-cascading mode for temporary views and regular 
> mode for the rest
> Note that a regular (persistent) view is a database object just like a table, 
> so after dropping a regular view (whether cached or not cached), any query 
> referring to that view should no long be valid. Hence if a cached persistent 
> view is dropped, we need to invalidate the all dependent caches so that 
> exceptions will be thrown for any later reference. On the other hand, a 
> temporary view is in fact equivalent to an unnamed DataSet, and dropping a 
> temporary view should have no impact on queries referencing that view. Thus 
> we should do non-cascading uncaching for temporary views, which also 
> guarantees a consistent uncaching behavior between temporary views and 
> unnamed DataSets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to