aokolnychyi commented on code in PR #52599:
URL: https://github.com/apache/spark/pull/52599#discussion_r2429424564


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##########
@@ -45,7 +45,8 @@ abstract class DataSourceV2RelationBase(
     output: Seq[AttributeReference],
     catalog: Option[CatalogPlugin],
     identifier: Option[Identifier],
-    options: CaseInsensitiveStringMap)
+    options: CaseInsensitiveStringMap,
+    timeTravelSpec: Option[TimeTravelSpec] = None)

Review Comment:
   One of the use cases that both Iceberg and Delta struggle today is checking 
that a query uses consistent versions of the table throughout the plan. Having 
`currentVersion` is one step but we need to distinguish time travel as it is OK 
to have different versions in that case. I want Spark to handle these checks 
and also reload tables to consistent versions whenever that's needed (will be 
done in subsequent PRs). Today both Iceberg and Delta try to implement this 
check/reload on their side but it is really tricky in connectors. There are 
still edge cases that are not handled.
   
   Another use case that is even bigger is tracking read sets in DELETE, 
UPDATE, and MERGE. I have a proposal/PR about a transactional catalog that 
allows one to capture all operations that happened during an operation for 
snapshot and serializable isolation. It is also important to track and 
distinguish time travel there.
   
   Does this make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to