GitHub user maryannxue opened a pull request:
https://github.com/apache/spark/pull/21594
[SPARK-24596][SQL] Non-cascading Cache Invalidation
## What changes were proposed in this pull request?
1. Add parameter 'cascade' in CacheManager.uncacheQuery(). Under
'cascade=false' mode, only invalidate the current cache, and for other
dependent caches, rebuild execution plan and reuse cached buffer.
2. Pass true/false from callers in different uncache scenarios:
- Drop tables and regular (persistent) views: regular mode
- Drop temporary views: non-cascading mode
- Modify table contents (INSERT/UPDATE/MERGE/DELETE): regular mode
- Call DataSet.unpersist(): non-cascading mode
Note that a regular (persistent) view is a database object just like a
table, so after dropping a regular view (whether cached or not cached), any
query referring to that view should no long be valid. Hence if a cached
persistent view is dropped, we need to invalidate the all dependent caches so
that exceptions will be thrown for any later reference. On the other hand, a
temporary view is in fact equivalent to an unnamed DataSet, and dropping a
temporary view should have no impact on queries referencing that view. Thus we
should do non-cascading uncaching for temporary views, which also guarantees a
consistent uncaching behavior between temporary views and unnamed DataSets.
## How was this patch tested?
New tests in CachedTableSuite and DatasetCacheSuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maryannxue/spark noncascading-cache
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21594.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21594
----
commit 27e484b97ec5f9fdbfdaa5c8c1d9f45233cbbdbe
Author: Maryann Xue <maryannxue@...>
Date: 2018-06-19T04:32:11Z
noncascading cache
commit 483008c577c0ec7335b0a9a1c567f60311bb83a6
Author: Maryann Xue <maryannxue@...>
Date: 2018-06-19T18:18:06Z
code refine
commit a782aacd5d4943b8bbfadde27a9c9e9d30c223fe
Author: Maryann Xue <maryannxue@...>
Date: 2018-06-19T18:24:57Z
Merge remote-tracking branch 'origin/master' into noncascading-cache
commit 0cd8dc10eb85b6df1704e13084f53f9cefe410b3
Author: Maryann Xue <maryannxue@...>
Date: 2018-06-19T21:36:29Z
refine test cases
commit 71b93ed598833d760955e972894685c089af297b
Author: Maryann Xue <maryannxue@...>
Date: 2018-06-19T22:19:05Z
refine test cases
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]