cloud-fan commented on PR #40812:
URL: https://github.com/apache/spark/pull/40812#issuecomment-1539764854

   I agree that we do need to copy the cached plan when getting it, but I feel 
the current change is hard to understand and reason about.
   
   I think a better place to fix is `CacheManager#useCachedDataInternal`. It 
does copy `InMemoryRelation` but does not copy the physical plan:
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L298-L306
   
   We can add a new method `InMemoryRelation#freshCopy` which takes new output 
attributes and copy the physical plan, then we change 
`cached.cachedRepresentation.withOutput(currentFragment.output)` to 
`cached.cachedRepresentation.freshCopy(currentFragment.output)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to