[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

dilipbiswal Tue, 19 Jun 2018 00:10:52 -0700

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/21590


    [SPARK-24423][SQL] Add a new option  for JDBC sources

    ## What changes were proposed in this pull request?
    Here is the description in the JIRA -
    
    Currently, our JDBC connector provides the option `dbtable` for users to 
specify the to-be-loaded JDBC source table. 
    
     ```SQL
     val jdbcDf = spark.read
       .format("jdbc")
       .option("*dbtable*", "dbName.tableName")
       .options(jdbcCredentials: Map)
       .load()
     ```
    
    Normally, users do not fetch the whole JDBC table due to the poor 
performance/throughput of JDBC. Thus, they normally just fetch a small set of 
tables. For advanced users, they can pass a subquery as the option.   
    
     ```SQL
     val query = """ (select * from tableName limit 10) as tmp """
     val jdbcDf = spark.read
       .format("jdbc")
       .option("*dbtable*", query)
       .options(jdbcCredentials: Map)
       .load()
     ```
    However, this is straightforward to end users. We should simply allow users 
to specify the query by a new option `query`. We will handle the complexity for 
them. 
    
     ```SQL
     val query = """select * from tableName limit 10"""
     val jdbcDf = spark.read
       .format("jdbc")
       .option("*{color:#ff0000}query{color}*", query)
       .options(jdbcCredentials: Map)
       .load()
    ```
    
    ## How was this patch tested?
    Added tests in JDBCSuite and JDBCWriterSuite.
    Also tested against MySQL, Postgress, Oracle, DB2 (using docker 
infrastructure) to make sure there are no syntax issues.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark SPARK-24423

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21590
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

Reply via email to