[GitHub] spark pull request: [SPARK-12579][SQL] Force user-specified JDBC d...

JoshRosen Tue, 29 Dec 2015 19:27:07 -0800

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/10519


    [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence

    Spark SQL's JDBC data source allows users to specify an explicit JDBC 
driver to load (using the `driver` argument), but in the current code it's 
possible that the user-specified driver will not be used when it comes time to 
actually create a JDBC connection.
    
    In a nutshell, the problem is that you might have multiple JDBC drivers on 
the classpath that claim to be able to handle the same subprotocol, so simply 
registering the user-provided driver class with the our `DriverRegistry` and 
JDBC's `DriverManager` is not sufficient to ensure that it's actually used when 
creating the JDBC connection.
    
    This patch addresses this issue by first registering the user-specified 
driver with the DriverManager, then iterating over the driver manager's loaded 
drivers in order to obtain the correct driver and use it to create a connection 
(previously, we just called `DriverManager.getConnection()` directly).
    
    If a user did not specify a JDBC driver to use, then we call 
`DriverManager.getDriver` to figure out the class of the driver to use, then 
pass that class's name to executors; this guards against corner-case bugs in 
situations where the driver and executor JVMs might have different sets of JDBC 
drivers on their classpaths (previously, there was the (rare) potential for 
`DriverManager.getConnection()` to use different drivers on the driver and 
executors if the user had not explicitly specified a JDBC driver class and the 
classpaths were different).
    
    This patch is inspired by a similar patch that I made to the 
`spark-redshift` library 
(https://github.com/databricks/spark-redshift/pull/143), which contains its own 
modified fork of some of Spark's JDBC data source code (for cross-Spark-version 
compatibility reasons).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark jdbc-driver-precedence

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10519
    
----
commit 3554d68fd38df399fa863c5c14110cc17a826038
Author: Josh Rosen <[email protected]>
Date:   2015-12-30T02:28:39Z

    Force user-specified JDBC driver to take precedence.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12579][SQL] Force user-specified JDBC d...

Reply via email to