j1wonpark opened a new pull request, #56688: URL: https://github.com/apache/spark/pull/56688
### What changes were proposed in this pull request? This PR implements four `DatabaseMetaData` methods in the Spark Connect JDBC driver (`SparkConnectDatabaseMetaData`) that previously threw `SQLFeatureNotSupportedException`: - `getPrimaryKeys` — returns an empty `ResultSet` with the JDBC-defined schema. Spark Connect does not expose primary keys over JDBC, so "no primary keys" is represented as an empty result rather than an error. - `getImportedKeys` / `getExportedKeys` — return an empty `ResultSet` with the JDBC foreign-key schema, for the same reason. Both share a private `emptyForeignKeys` helper. - `getTypeInfo` — returns a static catalog of the Spark SQL atomic types (12 rows), ordered by `DATA_TYPE`, mirroring the type-code/precision mapping already used by `JdbcTypeUtils`. `TIME` and `TIMESTAMP_NTZ` are omitted (new and duplicate JDBC type codes). The result-set schemas (column names, order, and types) match the canonical definitions already used by Spark's Thrift server operations (`GetPrimaryKeysOperation`, `GetCrossReferenceOperation`, `GetTypeInfoOperation`), with one intentional correction: the `KEY_SEQ` column uses the JDBC-spec name `KEY_SEQ` rather than the `KEQ_SEQ` typo inherited from Hive in the Thrift operations. `getFunctions` is intentionally left throwing and is out of scope for this PR. ### Why are the changes needed? Returning an empty `ResultSet` (rather than throwing) for metadata that a driver does not support is the conventional JDBC behavior, and it is what other engines in this ecosystem do: Trino returns an empty result set for `getPrimaryKeys`/`getImportedKeys`, and Hive does so for `getImportedKeys`. Throwing `SQLFeatureNotSupportedException` breaks otherwise-recoverable client introspection — for example, BI tools that probe primary/foreign keys to infer table relationships abort the metadata step instead of degrading to "no keys." `getTypeInfo` is the one method here for which Spark can return real data: its atomic types are statically known. Hive, Trino, and the Databricks JDBC driver all implement `getTypeInfo`; Spark Connect was the outlier in throwing. ### Does this PR introduce _any_ user-facing change? Yes. Previously these four methods threw `SQLFeatureNotSupportedException`. After this change: - `getPrimaryKeys`, `getImportedKeys`, and `getExportedKeys` return an empty `ResultSet` with the JDBC-defined columns. - `getTypeInfo` returns the catalog of Spark SQL atomic types. This is a change within the unreleased branch only; the Spark Connect JDBC driver has not been released. ### How was this patch tested? Added in-process tests to `SparkConnectDatabaseMetaDataSuite`: - `getPrimaryKeys` and `getImportedKeys`/`getExportedKeys` assert the result-set column schema and that the result is empty. - `getTypeInfo` asserts the column schema, the rows ordered by `DATA_TYPE`, that every type is nullable and searchable, that only `STRING` is case-sensitive, and that `DECIMAL` carries the expected precision and scale. ``` build/sbt 'connect-client-jdbc/testOnly *SparkConnectDatabaseMetaDataSuite' ``` All 10 tests pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
