juliuszsompolski commented on pull request #30919:
URL: https://github.com/apache/spark/pull/30919#issuecomment-752407993
We might do some future work with Simba about it.
To support 3-part catalog.database.table identifiers, currently Simba
JDBC/ODBC drivers accept a catalog named "SPARK" and drop it when translating
the queries to Spark, even with UseNativeQuery=1:
```
scala> val conn =
java.sql.DriverManager.getConnection("jdbc:spark://<...>.databricks.com:443/default;transportMode=http;ssl=1;httpPath=...;AuthMech=3;UID=...;PWD=...;UseNativeQuery=1")
conn: java.sql.Connection =
com.simba.spark.hivecommon.jdbc42.Hive42Connection@2484dbb7
scala> stmt.execute("CREATE TABLE SPARK.default.catalogtest(foo int)")
res0: Boolean = false
scala> stmt.executeQuery("SELECT * FROM SPARK.default.catalogtest")
res2: java.sql.ResultSet =
com.simba.spark.jdbc.jdbc42.S42ForwardResultSet@298e002d
```
the actual queries sent to Thriftserver are `CREATE TABLE
default.catalogtest(foo int)` and `SELECT * FROM default.catalogtest`, Simba
just drops the catalog name "SPARK" from the queries.
Simba will also return a canned response with a single catalog "Spark" to a
metadata getCatalogs call.
```
scala> conn.getMetaData.getCatalogs
res3: java.sql.ResultSet =
com.simba.spark.jdbc.jdbc42.S42MetaDataProxy@51d9fd30
scala> res3.next()
res4: Boolean = true
scala> res3.getObject(1)
res5: Object = Spark
scala> res3.next()
res8: Boolean = false
```
Thriftserver SparkGetCatalogsOperation just returns empty; Simba drivers
ignore it.
However, the following also seems to work correctly already:
```
scala> stmt.execute("CREATE TABLE spark_catalog.default.catalogtest2(foo
int)")
res11: Boolean = false
scala> stmt.executeQuery("SELECT * FROM SPARK.default.catalogtest2")
res12: java.sql.ResultSet =
com.simba.spark.jdbc.jdbc42.S42ForwardResultSet@59845a40
scala> stmt.executeQuery("SELECT * FROM spark_catalog.default.catalogtest2")
res13: java.sql.ResultSet =
com.simba.spark.jdbc.jdbc42.S42ForwardResultSet@41433530
scala> stmt.execute("CREATE TABLE spark_catalog2.default.catalogtest2(foo
int)")
java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing
query/statement. Error Code: 0, SQL state: Error running query:
org.apache.spark.sql.AnalysisException: The namespace in session catalog must
have exactly one name part: spark_catalog2.default.catalogtest2;, Query: CREATE
TABLE spark_catalog2.default.catalogtest2(foo int).
at
com.simba.spark.hivecommon.api.HS2Client.pollForOperationCompletion(Unknown
Source)
at
com.simba.spark.hivecommon.api.HS2Client.executeStatementInternal(Unknown
Source)
at com.simba.spark.hivecommon.api.HS2Client.executeStatement(Unknown
Source)
at
com.simba.spark.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown
Source)
at
com.simba.spark.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown
Source)
at com.simba.spark.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.simba.spark.jdbc.common.SStatement.execute(Unknown Source)
... 31 elided
Caused by: com.simba.spark.support.exceptions.GeneralException:
[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code:
0, SQL state: Error running query: org.apache.spark.sql.AnalysisException: The
namespace in session catalog must have exactly one name part:
spark_catalog2.default.catalogtest2;, Query: CREATE TABLE
spark_catalog2.default.catalogtest2(foo int).
... 38 more
```
For catalog names other than "SPARK", Simba drivers just forward them
verbatim.
What are Spark plans of supporting multiple catalogs? Should we start
returning it via SparkGetCatalogsOperation, and should Simba start respecting
the catalogs that are returned there, and drop it's own "SPARK" catalog
placeholder? I think there are some existing downstream connectors (I think
Alation) that depend on "SPARK" as catalog name, so Simba might need to keep
"SPARK" as a special default catalog.
cc @wangyum @bogdanghit
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]