szehon-ho opened a new pull request, #53572:
URL: https://github.com/apache/spark/pull/53572

   
   ### What changes were proposed in this pull request?
   Follow up of https://github.com/apache/spark/pull/51032 . That pr changed 
V2WriteCommand not to execute eagerly on df.cache(). However, there are a bunch 
of other commands that do.
   
   ```
   val df = sql("CREATE TABLE...")
   df.cache()  // executes again, fails with TableAlreadyExistsException
   ```
   
   Ideally, we skip CacheManager for all Command, because these are 
eagerly-executed already before resultDf.cache().  The problem is, it may be a 
behavior change. In some cases, we are lucky and the command, like for example 
DescribeTableExec, has a in-memory reference to Table object and keeps the old 
result despite repeated execution. However, others do not, for example V1 
commands that only keep the table identifier and hit the catalog on every 
execution.
   
   ```
   val df = sql("DESCRIBE TABLE....")
   sql("ALTER TABLE ... ADD COLUMN...")
   df.cache()  // executes again and caches the latest schema.
   ```
   
   To minimize backward compatibility issue, I make a new interface 
UsesCachedData to keep the existing behavior, but going forwardl, all Commands 
by default bypass the CacheManager.
   
   
   ### Why are the changes needed?
   To prevent the command with side-effect from being executed again if a user 
runs df.cache on the result of the command.
   
   ### Does this PR introduce _any_ user-facing change?
   Commands with side-effect on running resultDf.cache (that used to fail, or 
have dangerous behavior) should now no-op. 
   
   
   ### How was this patch tested?
   Existing unit test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to