Re: Looking for advice on integrating with a custom data source

Andy Grove Tue, 14 Jan 2020 19:08:27 -0800

With some extra debugging I can see that the getNewWithChildren call is
made to an earlier instance of GroupScan and not the instance created by
the filter push-down rule. I'm wondering if this is some kind of
hashCode/equals/toString/getDigest issue?


On Tue, Jan 14, 2020 at 7:52 PM Andy Grove <[email protected]> wrote:

> I'm now working on predicate push down ... I have a filter rule that is
> correctly extracting the predicates that the backend database supports and
> I am creating a new GroupScan containing these predicates, using the Kafka
> plugin as a reference. I see the GroupScan constructor being called after
> this, with the predicates populated So far so good ... but then I see calls
> to getDigest, getScanStats, and getNewWithChildren, and then I see calls to
> the GroupScan constructor with the predicates missing.
>
> Any pointers on what I might be missing? Is there more magic I need to
> know?
>
> Thanks!
>
> On Sun, Jan 12, 2020 at 5:34 PM Paul Rogers <[email protected]>
> wrote:
>
>> Hi Andy,
>>
>> Congrats! You are making good progress. Yes, the BatchCreator is a bit of
>> magic: Drill looks for a subclass that has your SubScan subclass as the
>> second parameter. Looks like you figured that out.
>>
>> Thanks,
>> - Paul
>>
>>
>>
>>     On Sunday, January 12, 2020, 1:45:16 PM PST, Andy Grove <
>> [email protected]> wrote:
>>
>>  Actually I managed to get past that error with an educated guess that if
>> I
>> created a BatchCreator class, it would automagically be picked up somehow.
>> I'm now at the point where my RecordReader is being invoked!
>>
>> On Sun, Jan 12, 2020 at 2:03 PM Andy Grove <[email protected]> wrote:
>>
>> > Between reading the tutorial and copying and pasting code from the Kudu
>> > storage plugin, I've been making reasonable progress with this but am I
>> but
>> > confused by one error I'm now hitting.
>> > ExecutionSetupException: Failure finding OperatorCreator constructor for
>> > config com.mydb.MyDbSubScan
>> > Prior to this, Drill had called getSpecificScan and then called a few of
>> > the methods on my subscan object. I wasn't sure what to return for
>> > getOperatorType so just returned the kudu subscan operator type and I'm
>> > wondering if the issue is related to that somehow?
>> >
>> > Thanks.
>> >
>> >
>> > On Sat, Jan 11, 2020 at 10:13 PM Andy Grove <[email protected]>
>> wrote:
>> >
>> >> Thank you both for the those responses. This is very helpful. I have
>> >> ordered a copy of the book too. I'm using Drill 1.17.0.
>> >>
>> >> I'll take a look at the Jdbc Storage Plugin code and see if it would be
>> >> feasible to add the logic I need there. In parallel, I've started
>> >> implementing a new storage plugin. I'll be working on this more
>> tomorrow
>> >> and I'm sure I'll be back with more questions soon.
>> >>
>> >> Thanks again for your help!
>> >>
>> >> Andy.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Sat, Jan 11, 2020 at 6:03 PM Charles Givre <[email protected]>
>> wrote:
>> >>
>> >>> HI Andy,
>> >>> Thanks for your interest in Drill.  I'm glad to see that Paul wrote
>> you
>> >>> back as well.  I was going to say I thought the JDBC storage plugin
>> did in
>> >>> fact push down columns and filters to the source system.
>> >>>
>> >>> Also, what version of Drill are you using?
>> >>>
>> >>> Writing a storage plugin for Drill is not trivial and I'd definitely
>> >>> recommend using the code from Paul's PR as that greatly simplifies
>> things.
>> >>> Here is a tutorial as well:
>> >>> https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin
>> >>>
>> >>> If you need additional help, please let us know.
>> >>> -- C
>> >>>
>> >>>
>> >>> On Jan 11, 2020, at 5:57 PM, Andy Grove <[email protected]>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I'd like to use Apache Drill with a custom data source that supports a
>> >>> subset of SQL.
>> >>>
>> >>> My goal is to have Drill push selection and predicates down to my data
>> >>> source but the rest of the query processing should take place in
>> Drill.
>> >>>
>> >>> I started out by writing a JDBC driver for the data source and
>> >>> registering
>> >>> that with Drill using the Jdbc Storage Plugin but it seems to just
>> pass
>> >>> the
>> >>> whole query through to my data source, so that approach isn't going to
>> >>> work
>> >>> unless I'm missing something?
>> >>>
>> >>> Is there any way to configure the JDBC storage plugin to only push
>> >>> certain
>> >>> parts of the query to the data source?
>> >>>
>> >>> If this isn't a good approach, do I need to write a custom storage
>> >>> plugin?
>> >>> Can these be added on the classpath or would that require me
>> maintaining
>> >>> a
>> >>> fork of the project?
>> >>>
>> >>>
>> >>>
>> >>> I appreciate any pointers anyone can give me.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Andy.
>> >>>
>> >>>
>> >>>
>>
>
>

Re: Looking for advice on integrating with a custom data source

Reply via email to