rdtr opened a new pull request, #56241:
URL: https://github.com/apache/spark/pull/56241

   ## What changes were proposed in this pull request?
   
     In `SparkConnectPlanner.handleSqlCommand`, the SDP guard previously 
received a `queryPlan` built by `transformRelation(relation)`. For 
`WITH_RELATIONS` matching `isValidSQLWithRefs`, that chain routes through 
`executeSQLWithRefs` → `session.sql(...)` and eagerly executes
     the root SQL (and any Commands resolved from references), producing a 
`CommandResult` that bypasses the blocklist. The normal execution path then 
runs the same query again.
   
     This PR introduces a recursive helper `blockUnsupportedSqlCommandIn(rel)` 
that walks the relation tree and inspects each component the normal handler 
would execute, without triggering execution. For a valid SQL-with-references 
`WITH_RELATIONS`, it parses the root SQL with
      `transformSql` to get an unresolved plan and recursively inspects each 
reference's `SubqueryAlias.input` so nested `WITH_RELATIONS` cannot re-trigger 
the executing chain. For other relation types, the behavior is unchanged.
   
     ## Why are the changes needed?
   
     Two bugs defeat the SDP guard:
   
     1. **Bypassed guard.** The blocklist checks for `Command` subclasses 
(`CreateTableAsSelect`, `InsertIntoStatement`, etc.), but after execution the 
plan is wrapped as `CommandResult` which is not on the list. Commands have 
already mutated state by the time the guard runs.
     Commands inside reference `SubqueryAlias` inputs also execute because 
`eagerlyExecuteCommands` walks the resolved plan tree.
   
     2. **Double execution.** After the guard, `handleSqlCommand` falls through 
to the normal execution path and runs `executeSQLWithRefs` again, causing 
duplicate side effects.
   
     The guard should match the runtime's execution surface: inspect both the 
root SQL and each reference's input, without triggering execution itself.
   
     ## Does this PR introduce _any_ user-facing change?
   
     Yes. SDP requests containing blocked Commands inside `WITH_RELATIONS` (at 
the root or in a reference's input) now fail with 
`UNSUPPORTED_PIPELINE_SPARK_SQL_COMMAND` before any side effects occur.
   
     ## How was this patch tested?
   
     Added three unit tests in `SparkDeclarativePipelinesServerSuite`:
   
     - Command at the root of `WITH_RELATIONS` is blocked, and the target table 
is asserted not to exist (verifying no side effect).
     - Command embedded in a `SubqueryAlias` reference input is blocked, and 
the target table is asserted not to exist.
     - Valid `SELECT` `WITH_RELATIONS` in pipeline analysis context is allowed 
(positive case).
   
     Without the fix, the first two tests fail because the target table is 
created as a side effect of the guard.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to