Re: Drill for Data Virtualization

Sarnath K Wed, 10 Apr 2019 21:03:10 -0700

Hi Kunal,

Thank you for your response. But what I read in this URL says it can be
done (though my own interpretation is muddled)
https://drill.apache.org/docs/rdbms-storage-plugin/

There is a statement in the documentation that says:

As with any source, Drill supports joins within and between all systems.
Drill additionally has powerful pushdown capabilities with RDBMS sources.
This includes support to push down join, where, group by, intersect and
other SQL operations into a particular RDBMS source (as appropriate).

>> That said, even if the feature existed, by design, only one fragment can
read from a JDBC storage plugin, as it uses a single connection to stream
out the resultset.

I did not understand this. Say, I GROUP BY a particular column and perform
"max", "min" and "sum" aggregation. These are all associative group summary
operations. So, I have send MAX Query to A and then MAX query to B. Get the
results from both into Drill cluster and then perform a MAX on the
partially reduced result. This will be cheaper than loading all data from A
and B into Drill and then performing the GROUP BY operation.

Can Drill do these smart group-by operations as on today? The documentation
I read above is encouraging (its pretty recent - Dec 2018).

Thanks for your time,
Best,
Sarnath

On Thu, Apr 11, 2019 at 1:54 AM Kunal Khatua <ku...@apache.org> wrote:

> Hi Sarnath
>
> From what I understand by your description, you are looking to see if
> Drill can push down the GROUP BY clause to the underlying JDBC sources A
> and B.
>
> Unfortunately, Drill does not support pushdown for the JDBC storage plugin
> as yet. That said, even if the feature existed, by design, only one
> fragment can read from a JDBC storage plugin, as it uses a single
> connection to stream out the resultset.
>
> ~ Kunal
>
> On 4/9/2019 8:59:49 AM, Sarnath K <stell...@gmail.com> wrote:
> Hi,
>
> I have a requirement where I need to split data between a fast RDBMS system
> (A) that will have HOT data and a slower cold storage (B)
>
> Both A and B provide JDBC drivers
>
> I am looking to see if Drill will help me in coming with a JDBC URL (C)
> which will hide the fact that data is split between A and B. i.e. Can Drill
> be used to implement Data Virtualization?
>
> As much as I can read about Drill, I can definitely create 2 tables in
> Drill one pointing to A and another to B.
> However when I do GROUP BY queries or FILTER queries -- Does Drill take
> advantage of the existing JDBC systems by actually sending a part of the
> GROUP BY to A and another to B and then reduce the result again? i.e. Some
> kind of smart predicate push-down for Analytical queries?
>
> Hope I sound clear to you. Appreciate your response much.
>
> Thank you,
>
> Best,
> Sarnath
>

Re: Drill for Data Virtualization

Reply via email to