Re: RDBMS Storage Plugin Configurations

Paul Rogers Wed, 15 Jan 2020 14:51:52 -0800

Hi Jiang,

Welcome to the Drill mailing list.


I think you may be making some assumptions about how Drill works, perhaps based 
on how other DB-driven applications work.

Drill is not primarily a front-end for an RDBS. Instead, it is primarily 
designed to scan distributed data as fast as possible to extract records of 
interest. Drill does support JDBC data sources, but this is not the main use 
case.

In Drill, each query is stand-alone: Drill opens connections as needed to 
whatever data source you use; reads data, and releases all resources. Since 
Drill is distributed, this happens on each node. Since Drill is multi-threaded, 
this work also happens for each "minor fragment" (thread of execution) on each 
node. Drill is also multi-user; each user might have their own DB security 
restrictions.

This makes sense: if we want to read at maximum speed across 10 minor fragments 
(say) then all 10 need their own DB connections and all will try to keep those 
connections 100% busy.

As a result, Drill has no DB connection pool: not within a query and not across 
queries. So, there is no idle timeout. The maximum number of connections is set 
by the maximum "slice width" (number of fragments per node) and the total 
number of nodes. Slice width is, by default, 70% of your CPU count. So, if you 
have 10 nodes with 8 cores each, you will have roughly 60 open DB connections 
for the duration of the query (assuming that the DB storage plugin knows how to 
shard queries across all those minor fragments. I'm not sure that the JDBC 
storage plugin knows how to do this. Can anyone clarify this point?)

It sounds like you have a particular use-case in mind that might benefit from 
connection caching. Can you share that use case to help us understand? And, of 
course, Drill is open source; if you find you need this ability, it can 
certainly be added.

Drillers: please offer corrections if I've overlooked something; I'm not super 
familiar with the details of the JDBC data source.

Thanks,
- Paul

 

    On Wednesday, January 15, 2020, 01:49:21 PM PST, Jiang Wu 
<jiang...@mulesoft.com.invalid> wrote:  
 
 Question on the RDBMS Storage Plugin: is it possible to set various options
for the database connection pool used for this storage plugin?  For
example, max number of connections, idle timeout, etc?

Thanks.

-- Jiang

Re: RDBMS Storage Plugin Configurations

Reply via email to