Hi Jiang, Welcome to Drill! Just as an FYI, there are several improvements underway for the JDBC plugin: https://issues.apache.org/jira/browse/DRILL-7467 <https://issues.apache.org/jira/browse/DRILL-7467> https://issues.apache.org/jira/projects/DRILL/issues/DRILL-7490 <https://issues.apache.org/jira/projects/DRILL/issues/DRILL-7490?filter=allissues&orderby=created+DESC,+priority+DESC,+updated+DESC>
With respect to the non-relational model, I'd echo Ted's question and ask what are you looking for specifically? There is work underway to get Drill to natively support additional non-relational source systems as well as the ability to natively query rest endpoints. Best, -- C > On Jan 15, 2020, at 5:51 PM, Paul Rogers <[email protected]> wrote: > > Hi Jiang, > > Welcome to the Drill mailing list. > > I think you may be making some assumptions about how Drill works, perhaps > based on how other DB-driven applications work. > > Drill is not primarily a front-end for an RDBS. Instead, it is primarily > designed to scan distributed data as fast as possible to extract records of > interest. Drill does support JDBC data sources, but this is not the main use > case. > > In Drill, each query is stand-alone: Drill opens connections as needed to > whatever data source you use; reads data, and releases all resources. Since > Drill is distributed, this happens on each node. Since Drill is > multi-threaded, this work also happens for each "minor fragment" (thread of > execution) on each node. Drill is also multi-user; each user might have their > own DB security restrictions. > > This makes sense: if we want to read at maximum speed across 10 minor > fragments (say) then all 10 need their own DB connections and all will try to > keep those connections 100% busy. > > As a result, Drill has no DB connection pool: not within a query and not > across queries. So, there is no idle timeout. The maximum number of > connections is set by the maximum "slice width" (number of fragments per > node) and the total number of nodes. Slice width is, by default, 70% of your > CPU count. So, if you have 10 nodes with 8 cores each, you will have roughly > 60 open DB connections for the duration of the query (assuming that the DB > storage plugin knows how to shard queries across all those minor fragments. > I'm not sure that the JDBC storage plugin knows how to do this. Can anyone > clarify this point?) > > It sounds like you have a particular use-case in mind that might benefit from > connection caching. Can you share that use case to help us understand? And, > of course, Drill is open source; if you find you need this ability, it can > certainly be added. > > Drillers: please offer corrections if I've overlooked something; I'm not > super familiar with the details of the JDBC data source. > > Thanks, > - Paul > > > > On Wednesday, January 15, 2020, 01:49:21 PM PST, Jiang Wu > <[email protected]> wrote: > > Question on the RDBMS Storage Plugin: is it possible to set various options > for the database connection pool used for this storage plugin? For > example, max number of connections, idle timeout, etc? > > Thanks. > > -- Jiang
