GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/10519
[SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
Spark SQL's JDBC data source allows users to specify an explicit JDBC
driver to load (using the `driver` argument), but in the current code it's
possible that the user-specified driver will not be used when it comes time to
actually create a JDBC connection.
In a nutshell, the problem is that you might have multiple JDBC drivers on
the classpath that claim to be able to handle the same subprotocol, so simply
registering the user-provided driver class with the our `DriverRegistry` and
JDBC's `DriverManager` is not sufficient to ensure that it's actually used when
creating the JDBC connection.
This patch addresses this issue by first registering the user-specified
driver with the DriverManager, then iterating over the driver manager's loaded
drivers in order to obtain the correct driver and use it to create a connection
(previously, we just called `DriverManager.getConnection()` directly).
If a user did not specify a JDBC driver to use, then we call
`DriverManager.getDriver` to figure out the class of the driver to use, then
pass that class's name to executors; this guards against corner-case bugs in
situations where the driver and executor JVMs might have different sets of JDBC
drivers on their classpaths (previously, there was the (rare) potential for
`DriverManager.getConnection()` to use different drivers on the driver and
executors if the user had not explicitly specified a JDBC driver class and the
classpaths were different).
This patch is inspired by a similar patch that I made to the
`spark-redshift` library
(https://github.com/databricks/spark-redshift/pull/143), which contains its own
modified fork of some of Spark's JDBC data source code (for cross-Spark-version
compatibility reasons).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark jdbc-driver-precedence
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10519.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10519
----
commit 3554d68fd38df399fa863c5c14110cc17a826038
Author: Josh Rosen <[email protected]>
Date: 2015-12-30T02:28:39Z
Force user-specified JDBC driver to take precedence.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]