cloud-fan commented on a change in pull request #29939:
URL: https://github.com/apache/spark/pull/29939#discussion_r501441101
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with
SharedSparkSession {
checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
}
}
+
+ test("DataFrameReader: jdbc") {
+ withTable("h2.test.abc") {
+ sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+ val properties = new Properties()
+ val df1 = spark.read.jdbc(url, "h2.test.abc", properties)
Review comment:
I'm a bit confused about this. There are 3 ways to use JDBC data source:
1. use `DataFrameReader/Writer` API to access JDBC tables/queries directly.
1. register as a table, and access the table.
1. register as a catalog, and access tables inside the catalog.
`spark.read.jdbc(url, "h2.test.abc", properties)` seems like a mix of 1 and
3. What's the use case you are targeting?
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with
SharedSparkSession {
checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
}
}
+
+ test("DataFrameReader: jdbc") {
+ withTable("h2.test.abc") {
+ sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+ val properties = new Properties()
+ val df1 = spark.read.jdbc(url, "h2.test.abc", properties)
Review comment:
We need to distinguish between APIs and shortcuts. For
`DataFrameWriter`, it has 3 APIs: `save`, `insertInto` and `saveAsTable`.
`parquet`, `json`, `jdbc`, etc. are shortcuts and eventually calls `save()`.
For `insertInto` and `saveAsTable`, they take Spark table name and should
support multi catalogs. For `save`, it interacts with data source directly with
options, and thus shouldn't support multi-catalog.
For this particular test, it looks confusing as the registered JDBC catalog
should already have the url config, why do we need to specify it again in
`spark.read.jdbc`?
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
##########
@@ -221,4 +221,21 @@ class JDBCV2Suite extends QueryTest with
SharedSparkSession {
checkAnswer(sql("SELECT name, id FROM h2.test.abc"), Row("bob", 4))
}
}
+
+ test("DataFrameReader: jdbc") {
+ withTable("h2.test.abc") {
+ sql("CREATE TABLE h2.test.abc USING _ AS SELECT * FROM h2.test.people")
+ val properties = new Properties()
+ val df1 = spark.read.jdbc(url, "h2.test.abc", properties)
Review comment:
In the doc of `spark.read.jdbc`: `@param table Name of the table in the
external database.`
This is not a spark table name, but a table name in the remote JDBC server
such as MySQL.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]