viirya opened a new pull request, #56103: URL: https://github.com/apache/spark/pull/56103
### What changes were proposed in this pull request? > **Builds on #56102 (SPARK-57056).** This PR's diff includes the SPARK-57056 commit. After SPARK-57056 merges, this PR will be rebased and shrink to its own commit. Extend the temporal clause so reads and writes can target a named branch on a `SupportsBranching` data source: ```sql SELECT * FROM t FOR BRANCH 'dev' SELECT * FROM t VERSION AS OF BRANCH 'dev' SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev' INSERT INTO t FOR BRANCH 'dev' SELECT ... INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ... INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE WHERE | ON ... INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE USING ... ``` BRANCH is the only temporal variant allowed on writes. `VERSION AS OF <int>` and `TIMESTAMP AS OF <ts>` on writes remain rejected (existing Spark constraint, now caught at parse time with a clearer error). Also add a new session config: ``` spark.sql.defaultBranch ``` When non-empty, every read and write against a `SupportsBranching` table is routed to the named branch. Tables that do not implement `SupportsBranching` silently ignore the config. An explicit `FOR BRANCH` clause always overrides the config. **Precedence:** 1. Explicit `FOR BRANCH` / `VERSION AS OF BRANCH` in the query. 2. `spark.sql.defaultBranch`. 3. Today's behavior (no branch targeting). **Implementation:** - `SupportsBranching` gains `loadBranch(name): Table`. - `TimeTravelSpec` gains `AsOfBranch(branch, isExplicit)`. `RelationTimeTravel` carries an optional `branch` field. - `UnresolvedRelation` carries the branch on writes via a reserved internal option key `BRANCH_AS_OF` (mirrors the existing `REQUIRED_WRITE_PRIVILEGES` pattern). This preserves the `NamedRelation` slot in `InsertIntoStatement` / `OverwriteByExpression` without requiring a structural change to those nodes. - `CatalogV2Util.getTable` composes `loadTable` + `loadBranch`, lifting the "no time travel on writes" assertion only for the branch case. - `RelationResolution` applies the default branch only on the persistent-relation path; temp views are unaffected. - `InMemoryTable.loadBranch` returns an independent `InMemoryTable` instance per branch so reads and writes are isolated end-to-end in tests. ### Why are the changes needed? SPARK-57056 lets a data source declare named branches and provides DDL to manage them, but offers no way to actually read from or write to a specific branch. Without this PR, branches are effectively write-only-from-other-systems metadata. This PR closes the loop so a Spark user can: ```sql INSERT INTO sales FOR BRANCH 'experimental' SELECT ...; SELECT total FROM sales FOR BRANCH 'experimental'; ``` and switch entire sessions to a branch via a config setting (useful for staging / CI environments). ### Does this PR introduce _any_ user-facing change? Yes: - New temporal-clause variants: `FOR BRANCH 'name'` and `VERSION AS OF BRANCH 'name'` (and the `SYSTEM_VERSION` synonym). - INSERT statements accept an optional `temporalClause` between the table identifier and the rest of the statement; only the branch variant is allowed (others raise a parse-time error). - New session config `spark.sql.defaultBranch` (default empty string — no change in behavior unless set). Data sources that do not implement `SupportsBranching`: - Silently ignore `spark.sql.defaultBranch`. - Reject an explicit `FOR BRANCH` clause with `AnalysisException`. ### How was this patch tested? - `PlanParserSuite`: extended the `as of syntax` test with four new cases covering `FOR BRANCH`, `VERSION AS OF BRANCH`, `SYSTEM_VERSION AS OF BRANCH`, and `FOR VERSION AS OF BRANCH`. - `SupportsBranchingSuite`: 8 new end-to-end tests covering SELECT `FOR BRANCH`, INSERT `FOR BRANCH`, INSERT OVERWRITE `FOR BRANCH`, the equivalence of `FOR BRANCH` and `VERSION AS OF BRANCH`, `spark.sql.defaultBranch` precedence, the silent-ignore behavior on non-branching tables, and the explicit-clause hard-error behavior. - Pre-existing time-travel tests in `DataSourceV2SQLSuiteV1Filter` continue to pass — no regression in the `VERSION AS OF` / `TIMESTAMP AS OF` paths. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.7) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
