[GitHub] [spark] hvanhovell commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

via GitHub Mon, 06 Mar 2023 04:44:27 -0800


hvanhovell commented on code in PR #40277:
URL: https://github.com/apache/spark/pull/40277#discussion_r1126365135



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:
##########
@@ -250,6 +250,46 @@ class DataFrameReader private[sql] (sparkSession: 
SparkSession) extends Logging
     jdbc(url, table, connectionProperties)
   }
 
+  /**
+   * Construct a `DataFrame` representing the database table accessible via 
JDBC URL url named
+   * table using connection properties. The `predicates` parameter gives a 
list expressions
+   * suitable for inclusion in WHERE clauses; each one defines one partition 
of the `DataFrame`.
+   *
+   * Don't create too many partitions in parallel on a large cluster; 
otherwise Spark might crash
+   * your external database systems.
+   *
+   * You can find the JDBC-specific option and parameter documentation for 
reading tables via JDBC
+   * in <a
+   * 
href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option";>
+   * Data Source Option</a> in the version you use.
+   *
+   * @param table
+   *   Name of the table in the external database.
+   * @param predicates
+   *   Condition in the where clause for each partition.
+   * @param connectionProperties
+   *   JDBC database connection arguments, a list of arbitrary string 
tag/value. Normally at least
+   *   a "user" and "password" property should be included. "fetchsize" can be 
used to control the
+   *   number of rows per fetch.
+   * @since 3.4.0
+   */
+  def jdbc(
+      url: String,
+      table: String,
+      predicates: Array[String],
+      connectionProperties: Properties): DataFrame = {
+    sparkSession.newDataFrame { builder =>

Review Comment:
   Can you please set the format to JDBC? We are now relying the presence of 
predicates to figure out that something is a JDBC table. That is relying far 
too heavily on the client doing the right thing, for example what would happen 
if you set format = parquet and still define predicates?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] hvanhovell commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

Reply via email to