szehon-ho opened a new pull request, #53572: URL: https://github.com/apache/spark/pull/53572
### What changes were proposed in this pull request? Follow up of https://github.com/apache/spark/pull/51032 . That pr changed V2WriteCommand not to execute eagerly on df.cache(). However, there are a bunch of other commands that do. ``` val df = sql("CREATE TABLE...") df.cache() // executes again, fails with TableAlreadyExistsException ``` Ideally, we skip CacheManager for all Command, because these are eagerly-executed already before resultDf.cache(). The problem is, it may be a behavior change. In some cases, we are lucky and the command, like for example DescribeTableExec, has a in-memory reference to Table object and keeps the old result despite repeated execution. However, others do not, for example V1 commands that only keep the table identifier and hit the catalog on every execution. ``` val df = sql("DESCRIBE TABLE....") sql("ALTER TABLE ... ADD COLUMN...") df.cache() // executes again and caches the latest schema. ``` To minimize backward compatibility issue, I make a new interface UsesCachedData to keep the existing behavior, but going forwardl, all Commands by default bypass the CacheManager. ### Why are the changes needed? To prevent the command with side-effect from being executed again if a user runs df.cache on the result of the command. ### Does this PR introduce _any_ user-facing change? Commands with side-effect on running resultDf.cache (that used to fail, or have dangerous behavior) should now no-op. ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
