Xiao Li created SPARK-24423:
-------------------------------

             Summary: Add a new option `query` for JDBC sources
                 Key: SPARK-24423
                 URL: https://issues.apache.org/jira/browse/SPARK-24423
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Xiao Li


Currently, our JDBC connector provides the option `dbtable` for users to 
specify the to-be-loaded JDBC source table. 
 
val jdbcDf = spark.read
  .format("jdbc")
  .option("*dbtable*", "dbName.tableName")
  .options(jdbcCredentials: Map)
  .load()
 
Normally, users do not fetch the whole JDBC table due to the poor 
performance/throughput of JDBC. Thus, they normally just fetch a small set of 
tables. For advanced users, they can pass a subquery as the option. 
 
val query = """ (select * from tableName limit 10) as tmp """
val jdbcDf = spark.read
  .format("jdbc")
  .option("*dbtable*", query)
  .options(jdbcCredentials: Map)
  .load()
 
However, this is straightforward to end users. We should simply allow users to 
specify the query by a new option `query`. We will handle the complexity for 
them. 
 
val query = """select * from tableName limit 10"""
val jdbcDf = spark.read
  .format("jdbc")
  .option("*{color:#ff0000}query{color}*", query)
  .options(jdbcCredentials: Map)
  .load()
 
Users are not allowed to specify query and dbtable at the same time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to