Tha sounds great! Hopefully we can find a good way to share rules between the Calcite adapter and the Druid-native interface without too much overhead.
-- Michael Mior michael.m...@gmail.com 2016-10-12 14:01 GMT-04:00 Julian Hyde <jh...@apache.org>: > FYI - Druid devs are proposing to embed Calcite in Druid, so that Druid > has a native SQL interface. > > I think Gian’s plan makes sense; its architecture is similar in a lot of > ways with how we are integrating with Phoenix. Calcite’s Druid adapter will > still exist, still be useful, and in fact I expect that Druid devs will end > up building on it. > > I replied on the thread: https://groups.google.com/ > forum/?pli=1#!topic/druid-development/3npt9Qxpjr0 < > https://groups.google.com/forum/?pli=1#!topic/druid- > development/3npt9Qxpjr0> > > Julian > > > Begin forwarded message: > > > > From: Gian Merlino <g...@imply.io> > > Subject: [druid-dev] [Proposal] Built-in SQL for Druid > > Date: October 12, 2016 at 9:50:48 AM PDT > > To: druid-developm...@googlegroups.com > > Reply-To: druid-developm...@googlegroups.com > > > > Inspired by the Calcite Druid adapter (https://groups.google.com/d/ > topic/druid-development/FK5D162ao74/discussion < > https://groups.google.com/d/topic/druid-development/FK5D162ao74/discussion>) > I've been playing around with something similar that lives inside of the > Druid Broker. It seems promising, so in this proposal I'm suggesting we > include an official SQL server inside Druid itself. > > > > I am hoping that we can: > > > > 1) Use Calcite for SQL parsing and optimizing, and use Avatica ( > https://calcite.apache.org/docs/avatica_overview.html < > https://calcite.apache.org/docs/avatica_overview.html>) for the server > and the JDBC client. > > 2) Like the official Calcite Druid adapter, have a set of rules that > push down filters, projections, aggregations, sorts, etc into normal Druid > queries. > > 3) Unlike the official Calcite Druid adapter, use Druid objects (like > DimFilter, ExtractionFn, etc) as model classes, since it avoids extra code, > helps with type safety, and speeds up development. > > 4) Have this all run on the Broker, which would then make normal Druid > queries to data nodes. > > 5) Work towards being able to push down more and more SQL into normal > Druid queries over time. > > > > Current status > > > > If people are interested in this proposal then I'll clean up the code a > bit and do a PR. Currently it's a rough prototype. Some things that do work: > > > > 1) Avatica handler running at /druid/v2/sql/ + Avatica JDBC driver > > 2) Determining column types with segment metadata queries > > 3) Pushing down operator sequences that look like filter -> project -> > aggregate -> project -> sort into groupBy, timeseries, and select queries > as appropriate > > 4) Using "intervals" to filter on time when appropriate > > 5) LIKE, range, equality, and boolean filters > > 6) SUM, MIN, MAX, AVG, COUNT, COUNT DISTINCT > > 7) Some extraction fns like SUBSTRING, CHAR_LENGTH > > 8) GROUP BY FLOOR(__time TO gran) for time-series > > 9) Arithmetic post-aggregations > > 10) Filtered aggregations using CASE or using FILTER(WHERE ...) > > 11) Semi-joins like SELECT ... WHERE xxx IN (SELECT ...) can run by > materializing the inner result on the broker and applying it to the outer > query as a filter. Obviously doesn't always work, but it works sometimes > (and it works more often than pulling the lefthand side into the Broker…). > > > > Non-exhaustive list of things that don't work: > > > > 1) Pushing down filter after aggregate (HAVING) > > 2) Push down of anything without a native Druid analog, like > multi-column extraction fns, aggregation of expressions, window functions, > etc. > > 3) Any extraction fns other than SUBSTRING, CHAR_LENGTH > > 4) A lot of time stuff, like x + INTERVAL, FLOOR(__time TO MONTH) = x, > etc. > > 5) Query time lookups > > 6) Select with pagination – only the first 1000 results are used > > 7) Any sort of memory usage controls on the Broker side > > > > FAQ > > > > 1) Why another SQL on Druid thing? There's already, like, 7 of them. > > > > I think the fact that there are 7 of them means there's clearly some > value in having a built-in implementation. Partially this is so we can > hopefully share some work between the projects. Partially this is because > Druid doesn't support some things that are needed for well rounded SQL > support (like multi-column extraction fns, aggregations of expressions, > etc) and having the SQL layer inside the Druid repo will make it possible > to develop those sorts of features hand in hand with the SQL planner rules. > > > > Btw, the 7 that I counted are, in alphabetical order, Calcite ( > https://calcite.apache.org/docs/druid_adapter.html < > https://calcite.apache.org/docs/druid_adapter.html>), Drill ( > https://groups.google.com/d/msg/druid-development/FK5D162ao74/EnYDjASWCQAJ > <https://groups.google.com/d/msg/druid-development/ > FK5D162ao74/EnYDjASWCQAJ>), Druid's own simple grammar (added in 2013, > removed in https://github.com/druid-io/druid/pull/2090 < > https://github.com/druid-io/druid/pull/2090>), Hive ( > https://cwiki.apache.org/confluence/display/Hive/Druid+Integration < > https://cwiki.apache.org/confluence/display/Hive/Druid+Integration>), > PlyQL (http://plywood.imply.io/plyql <http://plywood.imply.io/plyql>), > Sparkline (https://github.com/SparklineData/spark-druid-olap < > https://github.com/SparklineData/spark-druid-olap>), and Sql4D ( > https://github.com/srikalyc/Sql4D <https://github.com/srikalyc/Sql4D>). > > > > 2) Is the proposed SQL language actually SQL or is it "SQL-like"? > > > > In terms of what can be efficiently pushed down to Druid queries, it's > "SQL-like". A lot of common SQL features aren't supported – although I > think it makes sense to add more over time. Technically Calcite does speak > full SQL, but a lot of it at the start would get planned as pulling all the > raw data into the Broker and processing it in Calcite's interpreter. > > > > 3) Why not use the Druid adapter in Calcite? > > > > Calcite's Druid adapter doesn't depend on any Druid jars; it implements > the query language and protocol using its own set of model and client > classes. For a builtin approach I wanted to be able to use Druid's own > Query, DimFilter, ExtractionFn, etc in a type-safe way, and wanted to use > the query code that already exists in Druid for discovering and querying > data nodes. I think this will also help speed up development of Druid > features that allow more SQL to be pushed down. > > > > 4) Can we share work between a builtin Druid SQL and the other SQL on > Druid adapters that people are working on? > > > > Hopefully! I think it would make sense if a builtin Druid SQL could be > used for whatever Druid supports natively, and external SQL on Druid > adapters could be used when users want to do something that Druid doesn't > support. Sharing the work needed to translate "whatever Druid supports > natively" into Druid queries would help everyone. > > > > Hive and Drill already use Calcite internally, and I hope it's workable > to stuff Druid's own rules into their planners without changing too much. > If those projects are comfortable embedding druid-server then that should > work straight away. If they aren't comfortable embedding druid-server > (perhaps understandably) then we could bite the bullet and work on a > light(er) weight druid-client jar that has just enough to give us the > benefit of type checking, and does not include all the heavy Druid > functionality. > > > > If you're working on one of those projects, feedback is greatly > appreciated. > > > > 5) What happens when parts of the SQL query can't be converted to a > native Druid query? > > > > Calcite is rad and runs the parts that can't be pushed down through an > interpreter on the Druid Broker. Of course this means that if you use > constructs that are close to the data and can't be pushed down, like > grouping on CONCAT(foo, bar) or aggregating SUM(3 * bar), potentially a > surprisingly large amount of data will be pulled out into the Broker. This > is not great behavior and something should be done about that… > > > > 6) What about JOINs? > > > > I don't know, maybe it makes sense for Druid to have query types usable > for joins in the future. But it doesn't now; the closest thing is > query-time lookups, which is like a broadcast join. Without native join > support in Druid, it makes more sense to pull data out of Druid into > another system (like Drill or Hive or Spark) and do the join there. Even if > Druid did support native joins, there's still some value in using an > external execution engine to join Druid data with data from some other > system. Filters and aggregations can still potentially be pushed down, > depending on the query. > > > > 7) JDBC works, but what about ODBC? > > > > Avatica's home page says work on an ODBC client has not yet started. The > page at https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/ < > https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/> is > interesting, since the Phoenix Query Server is also Avatica based, so maybe > that work could be useful? However, it doesn't seem to be open source, and > when I tried to get the binary to work, the Windows ODBC setup thing > crashed after calling getTableTypes. Maybe someone at Hortonworks can > comment :) > > > > Gian > > > > -- > > You received this message because you are subscribed to the Google > Groups "Druid Development" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to druid-development+unsubscr...@googlegroups.com <mailto: > druid-development+unsubscr...@googlegroups.com>. > > To post to this group, send email to druid-developm...@googlegroups.com > <mailto:druid-developm...@googlegroups.com>. > > To view this discussion on the web visit https://groups.google.com/d/ > msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e- > 3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com <https://groups.google.com/d/ > msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e- > 3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com?utm_medium= > email&utm_source=footer>. > > For more options, visit https://groups.google.com/d/optout < > https://groups.google.com/d/optout>. > >