Thomas Bünger created DRILL-5216:
------------------------------------

             Summary: Set FetchSize to Speed up Metadata retrieval for JDBC 
storage plugin over high latency connections
                 Key: DRILL-5216
                 URL: https://issues.apache.org/jira/browse/DRILL-5216
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - JDBC
    Affects Versions: 1.9.0
         Environment: drill-embedded on ubuntu client - connected to a remote 
Oracle
            Reporter: Thomas Bünger
            Priority: Minor


The metadata retrieval uses the default fetchsize for the underlying JDBC 
driver, which in case of Oracle is only 10.
In larger scenarios - as in mine - the Oracle cluster hosts thousands of 
schemas and the small fetchsize results in hundres of individual roundtrips.
In the end every Drill query against this storage takes at least a minute 
(server is remote)

So far, Drill is using the JDBC metadata API 
{{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and 
could set an appropriate fetchsize before iterating the result set.
I've tested this locally and improved latency a lot, but am note sure how this 
affects other non-oracle JDBC drivers.

The other (potentially long) query is the table enumeration.
>From what I've seen is Drill not calling the JDBC driver directly, but goes 
>through apache.calcite calling {{getTableNames()}} which under the hood calls 
>{{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow 
>metadata retrieval due to small default fetch size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to