Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21173#discussion_r185384951 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala --- @@ -515,4 +515,15 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter { }.getMessage assert(e.contains("NULL not allowed for column \"NAME\"")) } + + test("SPARK-23856 Spark jdbc setQueryTimeout option") { + val errMsg = intercept[SparkException] { + spark.range(10000000L).selectExpr("id AS k", "id AS v").coalesce(1).write + .mode(SaveMode.Overwrite) + .option("queryTimeout", 1) + .option("batchsize", Int.MaxValue) + .jdbc(url1, "TEST.TIMEOUTTEST", properties) + }.getMessage + assert(errMsg.contains("Statement was canceled or the session timed out")) + } --- End diff -- @gatorsmile I added this test for write path though, this test failed. So, I checked the h2 jdbc driver impl.; `setQueryTimeout` just executes [SET QUERY_TIMEOUT](http://www.h2database.com/html/grammar.html#set_query_timeout) in h2 databases; https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/jdbc/JdbcConnection.java#L763 `executeBatch` in the h2 jdbc driver just invokes an `INSERT` query per entry in a batch; https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/jdbc/JdbcStatement.java#L778 Since h2 databases check the timeout for each query, the test throws no exception. However, I think this behaviour depends on jdbc driver implementations, e.g., the postgresql jdbc driver checks timeout for an entire batch; https://github.com/pgjdbc/pgjdbc/blob/dde8c0200c409a525ef3bfc7a0aa81e7cd458a59/pgjdbc/src/main/java/org/postgresql/jdbc/PgStatement.java#L921 If we have a consistent behaviour for this case, we need to handle in `JDBCRDD`, but I feel a little over-engineering. WDYT?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org