Hi Andy, There are likely multiple approaches; here are two. Some bit of code has to decide what can be pushed to your data source and what must remain in Drill. At present, there is no declarative way to say, "OK to push such-and-so expression, but keep this-and-that."
Instead, the current approach is for your plugin to tie into Drill's Calcite-based query planner. You define Calcite rules that fire to perform the push operations you want to support. The code in this area is somewhat obscure, but multiple examples exist in the Kafka and other plugins. Also, at present, storage "plugins" are not really plugins at compile time: they pretty much need to be built within the Drill source tree. This is especially true to run unit tests. (We'd like to improve this area of the project; suggestions welcome.) Generally, folks put their plugin in the "contrib" directory within Drill. Yes, you must maintain your own branch. However, as long as you do not modify Drill code (you shouldn't need to), it is not too hard to simply occasionally rebase your branch on top of a new Drill release. At runtime, however, plugins are true plugins: you can take the plugin jar you create using the above process and drop it into an "official" release directory. We talk a bit about this process in the book Learning Apache Drill from O'Reilly. We recently tried to clean up the plugin structure just a bit in PR 1914 (DRILL-7458) [1]. The PR provides just a few baby steps and suggestions are encouraged. The key new feature in this PR is an standardized way to handle filter push-downs to avoid the large amount of copy-and-paste previously required. The PR is the result of a recent project to create a storage plugin that included filter push-down. Notes on that process are in [2]. You mentioned that your data source is similar to JDBC. So, another approach is to modify the existing storage plugin to provide storage plugin config options to control what gets pushed down (assuming that the decision is simple enough to express as a few options.) In this case, you could offer your changes as a PR which the Drill project would maintain as part of the source base, saving you from creating your own fork. Thanks, - Paul [1] https://github.com/apache/drill/pull/1914 [2] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin On Saturday, January 11, 2020, 2:58:08 PM PST, Andy Grove <[email protected]> wrote: Hi, I'd like to use Apache Drill with a custom data source that supports a subset of SQL. My goal is to have Drill push selection and predicates down to my data source but the rest of the query processing should take place in Drill. I started out by writing a JDBC driver for the data source and registering that with Drill using the Jdbc Storage Plugin but it seems to just pass the whole query through to my data source, so that approach isn't going to work unless I'm missing something? Is there any way to configure the JDBC storage plugin to only push certain parts of the query to the data source? If this isn't a good approach, do I need to write a custom storage plugin? Can these be added on the classpath or would that require me maintaining a fork of the project? I appreciate any pointers anyone can give me. Thanks, Andy.
