Hi Kunal, I tried examining the plan for a simple group by. I see that the group by is pushed to JDBC step whose output goes to the project ... Which seems like pushdown is working fine ... We are trying other cases. I will keep posted. Thank you for your help and time. I understand Calcite is a great community effort...I have been following it for quite some time. Thanks!! Best, Sarnath
On Fri, Apr 12, 2019, 02:28 Kunal Khatua <[email protected]> wrote: > On 4/11/2019 12:39:24 PM, Sarnath K <[email protected]> wrote: > Thank you Kunal. > > >>>You could try creating views for each source and then doing a group by > on the union of those views... that *might* get you the results you want > > When you mention views, do you mean to say each view will be a group by > statement for that particular source....And we try to union them and group > again...This way, explicitly making up the Query do the pushdown.... That's > the idea you are referring to. Right!?? > Kunal Khatua: That is correct. Worth a try. Start with querying the views > individually to see if the pushdown occurs in the first place. > > > > Btw....Calcite (possibly) not recognizing the pushdown opportunity would be > a let down ... especially for flexible frameworks like Drill... In my > opinion... > Kunal Khatua: Yes, but again... Calcite is an independent open-source > project in use by many other OSS and commercial vendors. Considering many > such projects are driven by volunteer contributions, it's a miracle in my > opinion that the open-source software is able to achieve so much (and, > sometimes, putting commercial offerings to shame) without charging a penny > to the end users. > > Developers behind Drill have made contributions to Calcite in their > limited capacity, as have developers from other projects.. so, in many > ways, Drill has actually benefited from Calcite in more ways than it could > have by implementing its own Calcite substitute. Hopefully, someone in the > community can take a look at enhancing this feature as well. > > Thanks for your time. Appreciate much. I will keep posted. > > Best, > Satnath > > On Thu, Apr 11, 2019, 23:25 Kunal Khatua wrote: > > > Hi Sarnath > > > > I haven't tried your specific requirement, and it is possible that if you > > are querying only A or only B, Drill would be able to push it down to the > > source. > > > > However, it gets tricky when you are querying 2 or more sources in the > > same query, because (from my limited knowledge of Calcite) the Calcite > > parser needs to be aware that it can push filters down to both sources. > > With GROUP BY, multiple groupings across a single source versus across > > multiple sources are not semantically the same. > > > > You could try creating views for each source and then doing a group by on > > the union of those views... that *might* get you the results you want. > > > > You can give it a shot, but I suspect it won't be as performant. Let us > > know if you find it otherwise. > > > > ~ Kunal > > > > On 4/10/2019 9:02:24 PM, Sarnath K wrote: > > Hi Kunal, > > > > Thank you for your response. But what I read in this URL says it can be > > done (though my own interpretation is muddled) > > https://drill.apache.org/docs/rdbms-storage-plugin/ > > > > There is a statement in the documentation that says: > > > > As with any source, Drill supports joins within and between all systems. > > Drill additionally has powerful pushdown capabilities with RDBMS sources. > > This includes support to push down join, where, group by, intersect and > > other SQL operations into a particular RDBMS source (as appropriate). > > > > > > >> That said, even if the feature existed, by design, only one fragment > can > > read from a JDBC storage plugin, as it uses a single connection to stream > > out the resultset. > > > > I did not understand this. Say, I GROUP BY a particular column and > perform > > "max", "min" and "sum" aggregation. These are all associative group > summary > > operations. So, I have send MAX Query to A and then MAX query to B. Get > the > > results from both into Drill cluster and then perform a MAX on the > > partially reduced result. This will be cheaper than loading all data > from A > > and B into Drill and then performing the GROUP BY operation. > > > > Can Drill do these smart group-by operations as on today? The > documentation > > I read above is encouraging (its pretty recent - Dec 2018). > > > > Thanks for your time, > > Best, > > Sarnath > > > > > > > > On Thu, Apr 11, 2019 at 1:54 AM Kunal Khatua wrote: > > > > > Hi Sarnath > > > > > > From what I understand by your description, you are looking to see if > > > Drill can push down the GROUP BY clause to the underlying JDBC sources > A > > > and B. > > > > > > Unfortunately, Drill does not support pushdown for the JDBC storage > > plugin > > > as yet. That said, even if the feature existed, by design, only one > > > fragment can read from a JDBC storage plugin, as it uses a single > > > connection to stream out the resultset. > > > > > > ~ Kunal > > > > > > On 4/9/2019 8:59:49 AM, Sarnath K wrote: > > > Hi, > > > > > > I have a requirement where I need to split data between a fast RDBMS > > system > > > (A) that will have HOT data and a slower cold storage (B) > > > > > > Both A and B provide JDBC drivers > > > > > > I am looking to see if Drill will help me in coming with a JDBC URL (C) > > > which will hide the fact that data is split between A and B. i.e. Can > > Drill > > > be used to implement Data Virtualization? > > > > > > As much as I can read about Drill, I can definitely create 2 tables in > > > Drill one pointing to A and another to B. > > > However when I do GROUP BY queries or FILTER queries -- Does Drill take > > > advantage of the existing JDBC systems by actually sending a part of > the > > > GROUP BY to A and another to B and then reduce the result again? i.e. > > Some > > > kind of smart predicate push-down for Analytical queries? > > > > > > Hope I sound clear to you. Appreciate your response much. > > > > > > Thank you, > > > > > > Best, > > > Sarnath > > > > > >
