Re: Looking for advice on integrating with a custom data source

Paul Rogers Sat, 11 Jan 2020 15:21:28 -0800

Hi Andy,

There are likely multiple approaches; here are two. Some bit of code has to 
decide what can be pushed to your data source and what must remain in Drill. At 
present, there is no declarative way to say, "OK to push such-and-so 
expression, but keep this-and-that."


Instead, the current approach is for your plugin to tie into Drill's 
Calcite-based query planner. You define Calcite rules that fire to perform the 
push operations you want to support. The code in this area is somewhat obscure, 
but multiple examples exist in the Kafka and other plugins.

Also, at present, storage "plugins" are not really plugins at compile time: 
they pretty much need to be built within the Drill source tree. This is 
especially true to run unit tests. (We'd like to improve this area of the 
project; suggestions welcome.) Generally, folks put their plugin in the 
"contrib" directory within Drill. Yes, you must maintain your own branch. 
However, as long as you do not modify Drill code (you shouldn't need to), it is 
not too hard to simply occasionally rebase your branch on top of a new Drill 
release.

At runtime, however, plugins are true plugins: you can take the plugin jar you 
create using the above process and drop it into an "official" release 
directory. We talk a bit about this process in the book Learning Apache Drill 
from O'Reilly.


We recently tried to clean up the plugin structure just a bit in PR 1914 
(DRILL-7458) [1]. The PR provides just a few baby steps and suggestions are 
encouraged. The key new feature in this PR is an standardized way to handle 
filter push-downs to avoid the large amount of copy-and-paste previously 
required.


The PR is the result of a recent project to create a storage plugin that 
included filter push-down. Notes on that process are in [2].

You mentioned that your data source is similar to JDBC. So, another approach is 
to modify the existing storage plugin to provide storage plugin config options 
to control what gets pushed down (assuming that the decision is simple enough 
to express as a few options.) In this case, you could offer your changes as a 
PR which the Drill project would maintain as part of the source base, saving 
you from creating your own fork.

Thanks,
- Paul


[1] https://github.com/apache/drill/pull/1914
 
[2] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin



    On Saturday, January 11, 2020, 2:58:08 PM PST, Andy Grove 
<[email protected]> wrote:  
 
 Hi,

I'd like to use Apache Drill with a custom data source that supports a
subset of SQL.

My goal is to have Drill push selection and predicates down to my data
source but the rest of the query processing should take place in Drill.

I started out by writing a JDBC driver for the data source and registering
that with Drill using the Jdbc Storage Plugin but it seems to just pass the
whole query through to my data source, so that approach isn't going to work
unless I'm missing something?

Is there any way to configure the JDBC storage plugin to only push certain
parts of the query to the data source?

If this isn't a good approach, do I need to write a custom storage plugin?
Can these be added on the classpath or would that require me maintaining a
fork of the project?

I appreciate any pointers anyone can give me.

Thanks,

Andy.

Re: Looking for advice on integrating with a custom data source

Reply via email to