cloud-fan commented on a change in pull request #23644: [SPARK-26708][SQL]
Incorrect result caused by inconsistency between a SQL cache's cached RDD and
its physical plan
URL: https://github.com/apache/spark/pull/23644#discussion_r251281049
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
##########
@@ -180,7 +180,26 @@ class CacheManager extends Logging {
val it = cachedData.iterator()
while (it.hasNext) {
val cd = it.next()
- if (condition(cd.plan)) {
+ // If `clearCache` is false (which means the recache request comes
from a non-cascading
+ // cache invalidation) and the cache buffer has already been loaded,
we do not need to
+ // re-compile a physical plan because the old plan will not be used
any more by the
+ // CacheManager although it still lives in compiled `Dataset`s and it
could still work.
Review comment:
> the old plan will not be used any more by the CacheManager
How do we guarantee it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]