[GitHub] [spark] maropu commented on a change in pull request #33370: [SPARK-36163][SQL] Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

GitBox Thu, 15 Jul 2021 18:43:48 -0700


maropu commented on a change in pull request #33370:
URL: https://github.com/apache/spark/pull/33370#discussion_r670892697




##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
     </td>
     <td>read/write</td>
   </tr>  
+
+  <tr>
+   <td><code>connectionProvider</code></td>
+   <td>(none)</td>
+   <td>
+     The short name of the JDBC connection provider to use to connect to this 
URL, e.g. <code>db2</code>, <code>mssql</code>.
+     Must be one of the providers loaded with the JDBC data source. Used to 
disambiguate when more than one provider can handle
+     the specified driver and options.
+   </td>
+   <td>read/write</td>
+ </tr>  

Review comment:
       nit: the indents look incorrect.

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
##########
@@ -55,21 +56,48 @@ private[jdbc] object ConnectionProvider extends Logging {
     providers.filterNot(p => disabledProviders.contains(p.name)).toSeq
   }
 
-  def create(driver: Driver, options: Map[String, String]): Connection = {
+  def create(
+      driver: Driver,
+      options: Map[String, String],
+      connectionProviderName: Option[String]): Connection = {
     val filteredProviders = providers.filter(_.canHandle(driver, options))
-    require(filteredProviders.size == 1,
-      "JDBC connection initiated but not exactly one connection provider found 
which can handle " +
-        s"it. Found active providers: ${filteredProviders.mkString(", ")}")
+
+    if (filteredProviders.isEmpty) {
+      throw new IllegalArgumentException(
+        "Empty list of JDBC connection providers for the specified driver and 
options")
+    }
+
+    val selectedProvider = connectionProviderName match {
+      case Some(providerName) =>
+        // It is assumed that no two providers will have the same name
+        filteredProviders.filter(_.name == providerName).headOption.getOrElse {
+          throw new IllegalArgumentException(
+            s"Could not find a JDBC connection provider with name 
'$providerName' " +
+            "that can handle the specified driver and options. " +
+            s"Available providers are ${providers.mkString("[", ", ", "]")}")
+        }
+      case None =>
+        if (filteredProviders.size != 1) {
+          throw new IllegalArgumentException(
+            "JDBC connection initiated but more than one connection provider 
was found. Use " +
+            s"'${JDBCOptions.JDBC_CONNECTION_PROVIDER}' option to select a 
specific provider. " +
+            s"Found active providers ${filteredProviders.mkString("[", ", ", 
"]")}")
+        }
+        filteredProviders.head
+    }
+
     SecurityConfigurationLock.synchronized {
       // Inside getConnection it's safe to get parent again because 
SecurityConfigurationLock
       // makes sure it's untouched
       val parent = Configuration.getConfiguration
       try {
-        filteredProviders.head.getConnection(driver, options)
+        selectedProvider.getConnection(driver, options)
       } finally {
         logDebug("Restoring original security configuration")
         Configuration.setConfiguration(parent)
       }
     }
   }
 }
+
+private[jdbc] object ConnectionProvider extends ConnectionProviderBase

Review comment:
       You made this class for testing? Instead of creating an unnecessary 
class hierarchy, how about just an inner method for testing like this?
   ```
   
     // For testing
     private[jdbc] def doCreate(
       driver: Driver,
       providers: Seq[JdbcConnectionProvider],
       options: Map[String, String]): Connection = {
       
       ...
     }
   
     def create(driver: Driver, options: Map[String, String]): Connection = {
       doCreate(driver, providers, options)
     }
   ```

##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
     </td>
     <td>read/write</td>
   </tr>  
+
+  <tr>
+   <td><code>connectionProvider</code></td>
+   <td>(none)</td>
+   <td>
+     The short name of the JDBC connection provider to use to connect to this 
URL, e.g. <code>db2</code>, <code>mssql</code>.
+     Must be one of the providers loaded with the JDBC data source. Used to 
disambiguate when more than one provider can handle
+     the specified driver and options.

Review comment:
       How about describing a relationship to 
`DISABLED_JDBC_CONN_PROVIDER_LIST` here?

##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
     </td>
     <td>read/write</td>
   </tr>  
+
+  <tr>
+   <td><code>connectionProvider</code></td>
+   <td>(none)</td>
+   <td>
+     The short name of the JDBC connection provider to use to connect to this 
URL, e.g. <code>db2</code>, <code>mssql</code>.

Review comment:
       What does `short` means here? `The name of a JDBC connection provider to 
use ...`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/README.md
##########
@@ -46,6 +46,12 @@ so they can be turned off and can be replaced with custom 
implementation. All CP
 which must be unique. One can set the following configuration entry in 
`SparkConf` to turn off CPs:
 `spark.sql.sources.disabledJdbcConnProviderList=name1,name2`.
 
+## How to enforce a specific JDBC connection provider?
+
+When more than one JDBC connection provider can handle a specific driver and 
options, it is possible to
+disambiguate and enforce a particular CP for the JDBC data source. One can set 
the DataFrameReader

Review comment:
       Is this related to the reader only? how about the writer?

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuite.scala
##########
@@ -45,6 +54,84 @@ class ConnectionProviderSuite extends 
ConnectionProviderSuiteBase with SharedSpa
     }
   }
 
+  test("Throw an error selecting from an empty list of providers on create") {

Review comment:
       Could you add some tests for the case where a user setting both 
`DISABLED_JDBC_CONN_PROVIDER_LIST` and `connectionProvider`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #33370: [SPARK-36163][SQL] Propagate correct JDBC properties in JDBC connector provider and add "connectionProvider" option

Reply via email to