Re: [PR] [SPARK-54157][SQL] Fix refresh of DSv2 tables in Dataset [spark]

via GitHub Mon, 10 Nov 2025 22:38:21 -0800


cloud-fan commented on code in PR #52920:
URL: https://github.com/apache/spark/pull/52920#discussion_r2513020644



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala:
##########
@@ -203,8 +204,19 @@ class QueryExecution(
     }
   }
 
+  // refresh table versions before cache lookup
+  private val lazyTableVersionsRefreshed = LazyTry {
+    if (QueryExecution.lastExecutionId != id || 
TableRefreshUtil.shouldRefresh(commandExecuted)) {

Review Comment:
   It kind of make sense but I don't fully agree. I think the chance is low 
that we need to refresh the tables after a new execution. It may be a scan 
query execution or maybe altering other tables. This hurts perf a lot for a 
busy cluster serving many short queries at the same time.
   
   I think a simple time-based refresh policy is good enough.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54157][SQL] Fix refresh of DSv2 tables in Dataset [spark]

Reply via email to