[GitHub] spark pull request #21173: [SPARK-23856][SQL] Add an option `queryTimeout` i...

maropu Tue, 01 May 2018 20:14:33 -0700

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21173#discussion_r185384951
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -515,4 +515,15 @@ class JDBCWriteSuite extends SharedSQLContext with 
BeforeAndAfter {
         }.getMessage
         assert(e.contains("NULL not allowed for column \"NAME\""))
       }
    +
    +  test("SPARK-23856 Spark jdbc setQueryTimeout option") {
    +    val errMsg = intercept[SparkException] {
    +      spark.range(10000000L).selectExpr("id AS k", "id AS 
v").coalesce(1).write
    +        .mode(SaveMode.Overwrite)
    +        .option("queryTimeout", 1)
    +        .option("batchsize", Int.MaxValue)
    +        .jdbc(url1, "TEST.TIMEOUTTEST", properties)
    +    }.getMessage
    +    assert(errMsg.contains("Statement was canceled or the session timed 
out"))
    +  }
    --- End diff --
    
    @gatorsmile I added this test for write path though, this test failed. So, 
I checked the h2 jdbc driver impl.;
    `setQueryTimeout` just executes [SET 
QUERY_TIMEOUT](http://www.h2database.com/html/grammar.html#set_query_timeout) 
in h2 databases;
    
https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/jdbc/JdbcConnection.java#L763
    
    `executeBatch` in the h2 jdbc driver just invokes an `INSERT` query per 
entry in a batch;
    
https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/jdbc/JdbcStatement.java#L778
    
    Since h2 databases check the timeout for each query, the test throws no 
exception.
    However, I think this behaviour depends on jdbc driver implementations, 
e.g., the postgresql jdbc driver checks timeout for an entire batch;
    
https://github.com/pgjdbc/pgjdbc/blob/dde8c0200c409a525ef3bfc7a0aa81e7cd458a59/pgjdbc/src/main/java/org/postgresql/jdbc/PgStatement.java#L921
    
    If we have a consistent behaviour for this case, we need to handle in 
`JDBCRDD`, but I feel a little over-engineering. WDYT?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21173: [SPARK-23856][SQL] Add an option `queryTimeout` i...

Reply via email to