peter-toth commented on PR #34693:
URL: https://github.com/apache/spark/pull/34693#issuecomment-1111779058

   > Why does Spark JDBC source issue `SELECT * FROM (<query>) WHERE 1=0` 
instead of simply `<query>`?
   
   Because that way we can let MSSQL (or other) optimizer to kick in and return 
an empty resultset with the schema very fast.
   
   > Can we use WITH t AS (SELECT x, y FROM tbl) SELECT * FROM t WHERE x > 10 
directly to send to the database and get the schema?
   
   Well, we could do that with loosing the above optimization, but besides the 
"schema query" Spark also wraps the original query at other places. For example 
when the query is actually executed: 
https://github.com/apache/spark/pull/34693/files#diff-ecf5b374060c1222d3a0a1295b4ec2cb5d07603db460273484b1753e1cab9f90L370-L371
 so that JDBC sources can support different pushdowns and partitioning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to