viirya opened a new pull request, #56103:
URL: https://github.com/apache/spark/pull/56103

   ### What changes were proposed in this pull request?
   
   > **Builds on #56102 (SPARK-57056).** This PR's diff includes the 
SPARK-57056 commit. After SPARK-57056 merges, this PR will be rebased and 
shrink to its own commit.
   
   Extend the temporal clause so reads and writes can target a named branch on 
a `SupportsBranching` data source:
   
   ```sql
   SELECT * FROM t FOR BRANCH 'dev'
   SELECT * FROM t VERSION AS OF BRANCH 'dev'
   SELECT * FROM t SYSTEM_VERSION AS OF BRANCH 'dev'
   
   INSERT INTO t FOR BRANCH 'dev' SELECT ...
   INSERT OVERWRITE t FOR BRANCH 'dev' SELECT ...
   INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE WHERE | ON ...
   INSERT INTO t FOR BRANCH 'dev' tableAlias REPLACE USING ...
   ```
   
   BRANCH is the only temporal variant allowed on writes. `VERSION AS OF <int>` 
and `TIMESTAMP AS OF <ts>` on writes remain rejected (existing Spark 
constraint, now caught at parse time with a clearer error).
   
   Also add a new session config:
   
   ```
   spark.sql.defaultBranch
   ```
   
   When non-empty, every read and write against a `SupportsBranching` table is 
routed to the named branch. Tables that do not implement `SupportsBranching` 
silently ignore the config. An explicit `FOR BRANCH` clause always overrides 
the config.
   
   **Precedence:**
   1. Explicit `FOR BRANCH` / `VERSION AS OF BRANCH` in the query.
   2. `spark.sql.defaultBranch`.
   3. Today's behavior (no branch targeting).
   
   **Implementation:**
   - `SupportsBranching` gains `loadBranch(name): Table`.
   - `TimeTravelSpec` gains `AsOfBranch(branch, isExplicit)`. 
`RelationTimeTravel` carries an optional `branch` field.
   - `UnresolvedRelation` carries the branch on writes via a reserved internal 
option key `BRANCH_AS_OF` (mirrors the existing `REQUIRED_WRITE_PRIVILEGES` 
pattern). This preserves the `NamedRelation` slot in `InsertIntoStatement` / 
`OverwriteByExpression` without requiring a structural change to those nodes.
   - `CatalogV2Util.getTable` composes `loadTable` + `loadBranch`, lifting the 
"no time travel on writes" assertion only for the branch case.
   - `RelationResolution` applies the default branch only on the 
persistent-relation path; temp views are unaffected.
   - `InMemoryTable.loadBranch` returns an independent `InMemoryTable` instance 
per branch so reads and writes are isolated end-to-end in tests.
   
   ### Why are the changes needed?
   
   SPARK-57056 lets a data source declare named branches and provides DDL to 
manage them, but offers no way to actually read from or write to a specific 
branch. Without this PR, branches are effectively write-only-from-other-systems 
metadata. This PR closes the loop so a Spark user can:
   
   ```sql
   INSERT INTO sales FOR BRANCH 'experimental' SELECT ...;
   SELECT total FROM sales FOR BRANCH 'experimental';
   ```
   
   and switch entire sessions to a branch via a config setting (useful for 
staging / CI environments).
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes:
   - New temporal-clause variants: `FOR BRANCH 'name'` and `VERSION AS OF 
BRANCH 'name'` (and the `SYSTEM_VERSION` synonym).
   - INSERT statements accept an optional `temporalClause` between the table 
identifier and the rest of the statement; only the branch variant is allowed 
(others raise a parse-time error).
   - New session config `spark.sql.defaultBranch` (default empty string — no 
change in behavior unless set).
   
   Data sources that do not implement `SupportsBranching`:
   - Silently ignore `spark.sql.defaultBranch`.
   - Reject an explicit `FOR BRANCH` clause with `AnalysisException`.
   
   ### How was this patch tested?
   
   - `PlanParserSuite`: extended the `as of syntax` test with four new cases 
covering `FOR BRANCH`, `VERSION AS OF BRANCH`, `SYSTEM_VERSION AS OF BRANCH`, 
and `FOR VERSION AS OF BRANCH`.
   - `SupportsBranchingSuite`: 8 new end-to-end tests covering SELECT `FOR 
BRANCH`, INSERT `FOR BRANCH`, INSERT OVERWRITE `FOR BRANCH`, the equivalence of 
`FOR BRANCH` and `VERSION AS OF BRANCH`, `spark.sql.defaultBranch` precedence, 
the silent-ignore behavior on non-branching tables, and the explicit-clause 
hard-error behavior.
   - Pre-existing time-travel tests in `DataSourceV2SQLSuiteV1Filter` continue 
to pass — no regression in the `VERSION AS OF` / `TIMESTAMP AS OF` paths.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to