Re: RDBMS Storage Plugin Configurations

Charles Givre Wed, 15 Jan 2020 15:28:58 -0800

Hi Jiang, 
Welcome to Drill!
Just as an FYI, there are several improvements underway for the JDBC plugin:
https://issues.apache.org/jira/browse/DRILL-7467 
<https://issues.apache.org/jira/browse/DRILL-7467>
https://issues.apache.org/jira/projects/DRILL/issues/DRILL-7490 
<https://issues.apache.org/jira/projects/DRILL/issues/DRILL-7490?filter=allissues&orderby=created+DESC,+priority+DESC,+updated+DESC>


With respect to the non-relational model, I'd echo Ted's question and ask what 
are you looking for specifically?  There is work underway to get Drill to 
natively support additional non-relational source systems as well as the 
ability to natively query rest endpoints. 

Best,
-- C


> On Jan 15, 2020, at 5:51 PM, Paul Rogers <[email protected]> wrote:
> 
> Hi Jiang,
> 
> Welcome to the Drill mailing list.
> 
> I think you may be making some assumptions about how Drill works, perhaps 
> based on how other DB-driven applications work.
> 
> Drill is not primarily a front-end for an RDBS. Instead, it is primarily 
> designed to scan distributed data as fast as possible to extract records of 
> interest. Drill does support JDBC data sources, but this is not the main use 
> case.
> 
> In Drill, each query is stand-alone: Drill opens connections as needed to 
> whatever data source you use; reads data, and releases all resources. Since 
> Drill is distributed, this happens on each node. Since Drill is 
> multi-threaded, this work also happens for each "minor fragment" (thread of 
> execution) on each node. Drill is also multi-user; each user might have their 
> own DB security restrictions.
> 
> This makes sense: if we want to read at maximum speed across 10 minor 
> fragments (say) then all 10 need their own DB connections and all will try to 
> keep those connections 100% busy.
> 
> As a result, Drill has no DB connection pool: not within a query and not 
> across queries. So, there is no idle timeout. The maximum number of 
> connections is set by the maximum "slice width" (number of fragments per 
> node) and the total number of nodes. Slice width is, by default, 70% of your 
> CPU count. So, if you have 10 nodes with 8 cores each, you will have roughly 
> 60 open DB connections for the duration of the query (assuming that the DB 
> storage plugin knows how to shard queries across all those minor fragments. 
> I'm not sure that the JDBC storage plugin knows how to do this. Can anyone 
> clarify this point?)
> 
> It sounds like you have a particular use-case in mind that might benefit from 
> connection caching. Can you share that use case to help us understand? And, 
> of course, Drill is open source; if you find you need this ability, it can 
> certainly be added.
> 
> Drillers: please offer corrections if I've overlooked something; I'm not 
> super familiar with the details of the JDBC data source.
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Wednesday, January 15, 2020, 01:49:21 PM PST, Jiang Wu 
> <[email protected]> wrote:  
> 
> Question on the RDBMS Storage Plugin: is it possible to set various options
> for the database connection pool used for this storage plugin?  For
> example, max number of connections, idle timeout, etc?
> 
> Thanks.
> 
> -- Jiang

Re: RDBMS Storage Plugin Configurations

Reply via email to