cloud-fan commented on PR #40812: URL: https://github.com/apache/spark/pull/40812#issuecomment-1539764854
I agree that we do need to copy the cached plan when getting it, but I feel the current change is hard to understand and reason about. I think a better place to fix is `CacheManager#useCachedDataInternal`. It does copy `InMemoryRelation` but does not copy the physical plan: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L298-L306 We can add a new method `InMemoryRelation#freshCopy` which takes new output attributes and copy the physical plan, then we change `cached.cachedRepresentation.withOutput(currentFragment.output)` to `cached.cachedRepresentation.freshCopy(currentFragment.output)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
