maropu commented on a change in pull request #33370:
URL: https://github.com/apache/spark/pull/33370#discussion_r670892697
##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
</td>
<td>read/write</td>
</tr>
+
+ <tr>
+ <td><code>connectionProvider</code></td>
+ <td>(none)</td>
+ <td>
+ The short name of the JDBC connection provider to use to connect to this
URL, e.g. <code>db2</code>, <code>mssql</code>.
+ Must be one of the providers loaded with the JDBC data source. Used to
disambiguate when more than one provider can handle
+ the specified driver and options.
+ </td>
+ <td>read/write</td>
+ </tr>
Review comment:
nit: the indents look incorrect.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala
##########
@@ -55,21 +56,48 @@ private[jdbc] object ConnectionProvider extends Logging {
providers.filterNot(p => disabledProviders.contains(p.name)).toSeq
}
- def create(driver: Driver, options: Map[String, String]): Connection = {
+ def create(
+ driver: Driver,
+ options: Map[String, String],
+ connectionProviderName: Option[String]): Connection = {
val filteredProviders = providers.filter(_.canHandle(driver, options))
- require(filteredProviders.size == 1,
- "JDBC connection initiated but not exactly one connection provider found
which can handle " +
- s"it. Found active providers: ${filteredProviders.mkString(", ")}")
+
+ if (filteredProviders.isEmpty) {
+ throw new IllegalArgumentException(
+ "Empty list of JDBC connection providers for the specified driver and
options")
+ }
+
+ val selectedProvider = connectionProviderName match {
+ case Some(providerName) =>
+ // It is assumed that no two providers will have the same name
+ filteredProviders.filter(_.name == providerName).headOption.getOrElse {
+ throw new IllegalArgumentException(
+ s"Could not find a JDBC connection provider with name
'$providerName' " +
+ "that can handle the specified driver and options. " +
+ s"Available providers are ${providers.mkString("[", ", ", "]")}")
+ }
+ case None =>
+ if (filteredProviders.size != 1) {
+ throw new IllegalArgumentException(
+ "JDBC connection initiated but more than one connection provider
was found. Use " +
+ s"'${JDBCOptions.JDBC_CONNECTION_PROVIDER}' option to select a
specific provider. " +
+ s"Found active providers ${filteredProviders.mkString("[", ", ",
"]")}")
+ }
+ filteredProviders.head
+ }
+
SecurityConfigurationLock.synchronized {
// Inside getConnection it's safe to get parent again because
SecurityConfigurationLock
// makes sure it's untouched
val parent = Configuration.getConfiguration
try {
- filteredProviders.head.getConnection(driver, options)
+ selectedProvider.getConnection(driver, options)
} finally {
logDebug("Restoring original security configuration")
Configuration.setConfiguration(parent)
}
}
}
}
+
+private[jdbc] object ConnectionProvider extends ConnectionProviderBase
Review comment:
You made this class for testing? Instead of creating an unnecessary
class hierarchy, how about just an inner method for testing like this?
```
// For testing
private[jdbc] def doCreate(
driver: Driver,
providers: Seq[JdbcConnectionProvider],
options: Map[String, String]): Connection = {
...
}
def create(driver: Driver, options: Map[String, String]): Connection = {
doCreate(driver, providers, options)
}
```
##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
</td>
<td>read/write</td>
</tr>
+
+ <tr>
+ <td><code>connectionProvider</code></td>
+ <td>(none)</td>
+ <td>
+ The short name of the JDBC connection provider to use to connect to this
URL, e.g. <code>db2</code>, <code>mssql</code>.
+ Must be one of the providers loaded with the JDBC data source. Used to
disambiguate when more than one provider can handle
+ the specified driver and options.
Review comment:
How about describing a relationship to
`DISABLED_JDBC_CONN_PROVIDER_LIST` here?
##########
File path: docs/sql-data-sources-jdbc.md
##########
@@ -275,11 +275,22 @@ logging into the data sources.
</td>
<td>read/write</td>
</tr>
+
+ <tr>
+ <td><code>connectionProvider</code></td>
+ <td>(none)</td>
+ <td>
+ The short name of the JDBC connection provider to use to connect to this
URL, e.g. <code>db2</code>, <code>mssql</code>.
Review comment:
What does `short` means here? `The name of a JDBC connection provider to
use ...`?
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/README.md
##########
@@ -46,6 +46,12 @@ so they can be turned off and can be replaced with custom
implementation. All CP
which must be unique. One can set the following configuration entry in
`SparkConf` to turn off CPs:
`spark.sql.sources.disabledJdbcConnProviderList=name1,name2`.
+## How to enforce a specific JDBC connection provider?
+
+When more than one JDBC connection provider can handle a specific driver and
options, it is possible to
+disambiguate and enforce a particular CP for the JDBC data source. One can set
the DataFrameReader
Review comment:
Is this related to the reader only? how about the writer?
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuite.scala
##########
@@ -45,6 +54,84 @@ class ConnectionProviderSuite extends
ConnectionProviderSuiteBase with SharedSpa
}
}
+ test("Throw an error selecting from an empty list of providers on create") {
Review comment:
Could you add some tests for the case where a user setting both
`DISABLED_JDBC_CONN_PROVIDER_LIST` and `connectionProvider`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]