Re: [PR] [SPARK-54812][SQL] Make executable commands not execute on resultDf.cache() [spark]

via GitHub Tue, 27 Jan 2026 18:01:55 -0800


szehon-ho commented on PR #53572:
URL: https://github.com/apache/spark/pull/53572#issuecomment-3808531588


   chat with @cloud-fan and others offline.  It's not worth the complexity, so 
simplified the code
   
   The behavior is slightly changed as running df.cache() on the result of some 
commands like df = sql("SHOW TABLES") or df = sql("SHOW NAMESPACES") 
'snapshotted' the result again vs now being a no-op.  But this is incorrect, as 
df.cache should not trigger a second run for commands as per the contract, and 
the user may simply run df = sql("") if they want the content at that point 
they used to run df.cache().
   
   @cloud-fan can you take a look?  Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54812][SQL] Make executable commands not execute on resultDf.cache() [spark]

Reply via email to