rdtr opened a new pull request, #56241:
URL: https://github.com/apache/spark/pull/56241
## What changes were proposed in this pull request?
In `SparkConnectPlanner.handleSqlCommand`, the SDP guard previously
received a `queryPlan` built by `transformRelation(relation)`. For
`WITH_RELATIONS` matching `isValidSQLWithRefs`, that chain routes through
`executeSQLWithRefs` → `session.sql(...)` and eagerly executes
the root SQL (and any Commands resolved from references), producing a
`CommandResult` that bypasses the blocklist. The normal execution path then
runs the same query again.
This PR introduces a recursive helper `blockUnsupportedSqlCommandIn(rel)`
that walks the relation tree and inspects each component the normal handler
would execute, without triggering execution. For a valid SQL-with-references
`WITH_RELATIONS`, it parses the root SQL with
`transformSql` to get an unresolved plan and recursively inspects each
reference's `SubqueryAlias.input` so nested `WITH_RELATIONS` cannot re-trigger
the executing chain. For other relation types, the behavior is unchanged.
## Why are the changes needed?
Two bugs defeat the SDP guard:
1. **Bypassed guard.** The blocklist checks for `Command` subclasses
(`CreateTableAsSelect`, `InsertIntoStatement`, etc.), but after execution the
plan is wrapped as `CommandResult` which is not on the list. Commands have
already mutated state by the time the guard runs.
Commands inside reference `SubqueryAlias` inputs also execute because
`eagerlyExecuteCommands` walks the resolved plan tree.
2. **Double execution.** After the guard, `handleSqlCommand` falls through
to the normal execution path and runs `executeSQLWithRefs` again, causing
duplicate side effects.
The guard should match the runtime's execution surface: inspect both the
root SQL and each reference's input, without triggering execution itself.
## Does this PR introduce _any_ user-facing change?
Yes. SDP requests containing blocked Commands inside `WITH_RELATIONS` (at
the root or in a reference's input) now fail with
`UNSUPPORTED_PIPELINE_SPARK_SQL_COMMAND` before any side effects occur.
## How was this patch tested?
Added three unit tests in `SparkDeclarativePipelinesServerSuite`:
- Command at the root of `WITH_RELATIONS` is blocked, and the target table
is asserted not to exist (verifying no side effect).
- Command embedded in a `SubqueryAlias` reference input is blocked, and
the target table is asserted not to exist.
- Valid `SELECT` `WITH_RELATIONS` in pipeline analysis context is allowed
(positive case).
Without the fix, the first two tests fail because the target table is
created as a side effect of the guard.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]