peter-toth opened a new pull request, #36440:
URL: https://github.com/apache/spark/pull/36440

   ### What changes were proposed in this pull request?
   Currently CTE queries from Spark are not supported with MSSQL server via 
JDBC. This is because MSSQL server doesn't support the nested CTE syntax 
(`SELECT * FROM (WITH t AS (...) SELECT ... FROM t) WHERE 1=0`) that Spark 
builds from the original query (`options.tableOrQuery`) in 
`JDBCRDD.resolveTable()`  and in `JDBCRDD.compute()`.
   Unfortunately, it is non-trivial to split an arbitrary query into "with" and 
"regular" clauses in `MsSqlServerDialect`. So instead, I'm proposing a new 
general JDBC option `prepareQuery` that users can use if they have complex 
queries:
   ```
   val df = spark.read.format("jdbc")
     .option("url", jdbcUrl)
     .option("prepareQuery", "WITH t AS (SELECT x, y FROM tbl)")
     .option("query", "SELECT * FROM t WHERE x > 10")
     .load()
   ```
   This change also works with MSSQL's temp table syntax:
   ```
   val df = spark.read.format("jdbc")
     .option("url", jdbcUrl)
     .option("prepareQuery", "(SELECT * INTO #TempTable FROM (SELECT * FROM tbl 
WHERE x > 10) t)")
     .option("query", "SELECT * FROM #TempTable")
     .load()
   ```
   
   ### Why are the changes needed?
   To support CTE and temp table queries with MSSQL.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, CTE and temp table queries are supported form now.
   
   ### How was this patch tested?
   Added new integration UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to