aokolnychyi commented on code in PR #52599:
URL: https://github.com/apache/spark/pull/52599#discussion_r2429424564
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##########
@@ -45,7 +45,8 @@ abstract class DataSourceV2RelationBase(
output: Seq[AttributeReference],
catalog: Option[CatalogPlugin],
identifier: Option[Identifier],
- options: CaseInsensitiveStringMap)
+ options: CaseInsensitiveStringMap,
+ timeTravelSpec: Option[TimeTravelSpec] = None)
Review Comment:
One of the use cases that both Iceberg and Delta struggle today is checking
that a query uses consistent versions of the table throughout the plan. Having
`currentVersion` is one step but we need to distinguish time travel as it is OK
to have different versions in that case. I want Spark to handle these checks
and also reload tables to consistent versions whenever that's needed (will be
done in subsequent PRs). Today both Iceberg and Delta try to implement this
check/reload on their side but it is really tricky in connectors. There are
still edge cases that are not handled.
Another use case that is even bigger is tracking read sets in DELETE,
UPDATE, and MERGE. I have a proposal/PR about a transactional catalog that
allows one to capture all operations that happened during an operation for
snapshot and serializable isolation. It is also important to track and
distinguish time travel there.
Does this make sense?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]