hvanhovell commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1126365135
########## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ########## @@ -250,6 +250,46 @@ class DataFrameReader private[sql] (sparkSession: SparkSession) extends Logging jdbc(url, table, connectionProperties) } + /** + * Construct a `DataFrame` representing the database table accessible via JDBC URL url named + * table using connection properties. The `predicates` parameter gives a list expressions + * suitable for inclusion in WHERE clauses; each one defines one partition of the `DataFrame`. + * + * Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash + * your external database systems. + * + * You can find the JDBC-specific option and parameter documentation for reading tables via JDBC + * in <a + * href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option"> + * Data Source Option</a> in the version you use. + * + * @param table + * Name of the table in the external database. + * @param predicates + * Condition in the where clause for each partition. + * @param connectionProperties + * JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least + * a "user" and "password" property should be included. "fetchsize" can be used to control the + * number of rows per fetch. + * @since 3.4.0 + */ + def jdbc( + url: String, + table: String, + predicates: Array[String], + connectionProperties: Properties): DataFrame = { + sparkSession.newDataFrame { builder => Review Comment: Can you please set the format to JDBC? We are now relying the presence of predicates to figure out that something is a JDBC table. That is relying far too heavily on the client doing the right thing, for example what would happen if you set format = parquet and still define predicates? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org