j1wonpark opened a new pull request, #56688:
URL: https://github.com/apache/spark/pull/56688

   ### What changes were proposed in this pull request?
   
   This PR implements four `DatabaseMetaData` methods in the Spark Connect JDBC 
driver (`SparkConnectDatabaseMetaData`) that previously threw 
`SQLFeatureNotSupportedException`:
   
   - `getPrimaryKeys` — returns an empty `ResultSet` with the JDBC-defined 
schema. Spark Connect does not expose primary keys over JDBC, so "no primary 
keys" is represented as an empty result rather than an error.
   - `getImportedKeys` / `getExportedKeys` — return an empty `ResultSet` with 
the JDBC foreign-key schema, for the same reason. Both share a private 
`emptyForeignKeys` helper.
   - `getTypeInfo` — returns a static catalog of the Spark SQL atomic types (12 
rows), ordered by `DATA_TYPE`, mirroring the type-code/precision mapping 
already used by `JdbcTypeUtils`. `TIME` and `TIMESTAMP_NTZ` are omitted (new 
and duplicate JDBC type codes).
   
   The result-set schemas (column names, order, and types) match the canonical 
definitions already used by Spark's Thrift server operations 
(`GetPrimaryKeysOperation`, `GetCrossReferenceOperation`, 
`GetTypeInfoOperation`), with one intentional correction: the `KEY_SEQ` column 
uses the JDBC-spec name `KEY_SEQ` rather than the `KEQ_SEQ` typo inherited from 
Hive in the Thrift operations.
   
   `getFunctions` is intentionally left throwing and is out of scope for this 
PR.
   
   ### Why are the changes needed?
   
   Returning an empty `ResultSet` (rather than throwing) for metadata that a 
driver does not support is the conventional JDBC behavior, and it is what other 
engines in this ecosystem do: Trino returns an empty result set for 
`getPrimaryKeys`/`getImportedKeys`, and Hive does so for `getImportedKeys`. 
Throwing `SQLFeatureNotSupportedException` breaks otherwise-recoverable client 
introspection — for example, BI tools that probe primary/foreign keys to infer 
table relationships abort the metadata step instead of degrading to "no keys."
   
   `getTypeInfo` is the one method here for which Spark can return real data: 
its atomic types are statically known. Hive, Trino, and the Databricks JDBC 
driver all implement `getTypeInfo`; Spark Connect was the outlier in throwing.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Previously these four methods threw `SQLFeatureNotSupportedException`. 
After this change:
   - `getPrimaryKeys`, `getImportedKeys`, and `getExportedKeys` return an empty 
`ResultSet` with the JDBC-defined columns.
   - `getTypeInfo` returns the catalog of Spark SQL atomic types.
   
   This is a change within the unreleased branch only; the Spark Connect JDBC 
driver has not been released.
   
   ### How was this patch tested?
   
   Added in-process tests to `SparkConnectDatabaseMetaDataSuite`:
   - `getPrimaryKeys` and `getImportedKeys`/`getExportedKeys` assert the 
result-set column schema and that the result is empty.
   - `getTypeInfo` asserts the column schema, the rows ordered by `DATA_TYPE`, 
that every type is nullable and searchable, that only `STRING` is 
case-sensitive, and that `DECIMAL` carries the expected precision and scale.
   
   ```
   build/sbt 'connect-client-jdbc/testOnly *SparkConnectDatabaseMetaDataSuite'
   ```
   All 10 tests pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to