Re: [druid-dev] [Proposal] Built-in SQL for Druid

Michael Mior Wed, 12 Oct 2016 13:01:48 -0700

Tha sounds great! Hopefully we can find a good way to share rules between
the Calcite adapter and the Druid-native interface without too much
overhead.


--
Michael Mior
michael.m...@gmail.com

2016-10-12 14:01 GMT-04:00 Julian Hyde <jh...@apache.org>:

> FYI - Druid devs are proposing to embed Calcite in Druid, so that Druid
> has a native SQL interface.
>
> I think Gian’s plan makes sense; its architecture is similar in a lot of
> ways with how we are integrating with Phoenix. Calcite’s Druid adapter will
> still exist, still be useful, and in fact I expect that Druid devs will end
> up building on it.
>
> I replied on the thread: https://groups.google.com/
> forum/?pli=1#!topic/druid-development/3npt9Qxpjr0 <
> https://groups.google.com/forum/?pli=1#!topic/druid-
> development/3npt9Qxpjr0>
>
> Julian
>
> > Begin forwarded message:
> >
> > From: Gian Merlino <g...@imply.io>
> > Subject: [druid-dev] [Proposal] Built-in SQL for Druid
> > Date: October 12, 2016 at 9:50:48 AM PDT
> > To: druid-developm...@googlegroups.com
> > Reply-To: druid-developm...@googlegroups.com
> >
> > Inspired by the Calcite Druid adapter (https://groups.google.com/d/
> topic/druid-development/FK5D162ao74/discussion <
> https://groups.google.com/d/topic/druid-development/FK5D162ao74/discussion>)
> I've been playing around with something similar that lives inside of the
> Druid Broker. It seems promising, so in this proposal I'm suggesting we
> include an official SQL server inside Druid itself.
> >
> > I am hoping that we can:
> >
> > 1) Use Calcite for SQL parsing and optimizing, and use Avatica (
> https://calcite.apache.org/docs/avatica_overview.html <
> https://calcite.apache.org/docs/avatica_overview.html>) for the server
> and the JDBC client.
> > 2) Like the official Calcite Druid adapter, have a set of rules that
> push down filters, projections, aggregations, sorts, etc into normal Druid
> queries.
> > 3) Unlike the official Calcite Druid adapter, use Druid objects (like
> DimFilter, ExtractionFn, etc) as model classes, since it avoids extra code,
> helps with type safety, and speeds up development.
> > 4) Have this all run on the Broker, which would then make normal Druid
> queries to data nodes.
> > 5) Work towards being able to push down more and more SQL into normal
> Druid queries over time.
> >
> > Current status
> >
> > If people are interested in this proposal then I'll clean up the code a
> bit and do a PR. Currently it's a rough prototype. Some things that do work:
> >
> > 1) Avatica handler running at /druid/v2/sql/ + Avatica JDBC driver
> > 2) Determining column types with segment metadata queries
> > 3) Pushing down operator sequences that look like filter -> project ->
> aggregate -> project -> sort into groupBy, timeseries, and select queries
> as appropriate
> > 4) Using "intervals" to filter on time when appropriate
> > 5) LIKE, range, equality, and boolean filters
> > 6) SUM, MIN, MAX, AVG, COUNT, COUNT DISTINCT
> > 7) Some extraction fns like SUBSTRING, CHAR_LENGTH
> > 8) GROUP BY FLOOR(__time TO gran) for time-series
> > 9) Arithmetic post-aggregations
> > 10) Filtered aggregations using CASE or using FILTER(WHERE ...)
> > 11) Semi-joins like SELECT ... WHERE xxx IN (SELECT ...) can run by
> materializing the inner result on the broker and applying it to the outer
> query as a filter. Obviously doesn't always work, but it works sometimes
> (and it works more often than pulling the lefthand side into the Broker…).
> >
> > Non-exhaustive list of things that don't work:
> >
> > 1) Pushing down filter after aggregate (HAVING)
> > 2) Push down of anything without a native Druid analog, like
> multi-column extraction fns, aggregation of expressions, window functions,
> etc.
> > 3) Any extraction fns other than SUBSTRING, CHAR_LENGTH
> > 4) A lot of time stuff, like x + INTERVAL, FLOOR(__time TO MONTH) = x,
> etc.
> > 5) Query time lookups
> > 6) Select with pagination – only the first 1000 results are used
> > 7) Any sort of memory usage controls on the Broker side
> >
> > FAQ
> >
> > 1) Why another SQL on Druid thing? There's already, like, 7 of them.
> >
> > I think the fact that there are 7 of them means there's clearly some
> value in having a built-in implementation. Partially this is so we can
> hopefully share some work between the projects. Partially this is because
> Druid doesn't support some things that are needed for well rounded SQL
> support (like multi-column extraction fns, aggregations of expressions,
> etc) and having the SQL layer inside the Druid repo will make it possible
> to develop those sorts of features hand in hand with the SQL planner rules.
> >
> > Btw, the 7 that I counted are, in alphabetical order, Calcite (
> https://calcite.apache.org/docs/druid_adapter.html <
> https://calcite.apache.org/docs/druid_adapter.html>), Drill (
> https://groups.google.com/d/msg/druid-development/FK5D162ao74/EnYDjASWCQAJ
> <https://groups.google.com/d/msg/druid-development/
> FK5D162ao74/EnYDjASWCQAJ>), Druid's own simple grammar (added in 2013,
> removed in https://github.com/druid-io/druid/pull/2090 <
> https://github.com/druid-io/druid/pull/2090>), Hive (
> https://cwiki.apache.org/confluence/display/Hive/Druid+Integration <
> https://cwiki.apache.org/confluence/display/Hive/Druid+Integration>),
> PlyQL (http://plywood.imply.io/plyql <http://plywood.imply.io/plyql>),
> Sparkline (https://github.com/SparklineData/spark-druid-olap <
> https://github.com/SparklineData/spark-druid-olap>), and Sql4D (
> https://github.com/srikalyc/Sql4D <https://github.com/srikalyc/Sql4D>).
> >
> > 2) Is the proposed SQL language actually SQL or is it "SQL-like"?
> >
> > In terms of what can be efficiently pushed down to Druid queries, it's
> "SQL-like". A lot of common SQL features aren't supported – although I
> think it makes sense to add more over time. Technically Calcite does speak
> full SQL, but a lot of it at the start would get planned as pulling all the
> raw data into the Broker and processing it in Calcite's interpreter.
> >
> > 3) Why not use the Druid adapter in Calcite?
> >
> > Calcite's Druid adapter doesn't depend on any Druid jars; it implements
> the query language and protocol using its own set of model and client
> classes. For a builtin approach I wanted to be able to use Druid's own
> Query, DimFilter, ExtractionFn, etc in a type-safe way, and wanted to use
> the query code that already exists in Druid for discovering and querying
> data nodes. I think this will also help speed up development of Druid
> features that allow more SQL to be pushed down.
> >
> > 4) Can we share work between a builtin Druid SQL and the other SQL on
> Druid adapters that people are working on?
> >
> > Hopefully! I think it would make sense if a builtin Druid SQL could be
> used for whatever Druid supports natively, and external SQL on Druid
> adapters could be used when users want to do something that Druid doesn't
> support. Sharing the work needed to translate "whatever Druid supports
> natively" into Druid queries would help everyone.
> >
> > Hive and Drill already use Calcite internally, and I hope it's workable
> to stuff Druid's own rules into their planners without changing too much.
> If those projects are comfortable embedding druid-server then that should
> work straight away. If they aren't comfortable embedding druid-server
> (perhaps understandably) then we could bite the bullet and work on a
> light(er) weight druid-client jar that has just enough to give us the
> benefit of type checking, and does not include all the heavy Druid
> functionality.
> >
> > If you're working on one of those projects, feedback is greatly
> appreciated.
> >
> > 5) What happens when parts of the SQL query can't be converted to a
> native Druid query?
> >
> > Calcite is rad and runs the parts that can't be pushed down through an
> interpreter on the Druid Broker. Of course this means that if you use
> constructs that are close to the data and can't be pushed down, like
> grouping on CONCAT(foo, bar) or aggregating SUM(3 * bar), potentially a
> surprisingly large amount of data will be pulled out into the Broker. This
> is not great behavior and something should be done about that…
> >
> > 6) What about JOINs?
> >
> > I don't know, maybe it makes sense for Druid to have query types usable
> for joins in the future. But it doesn't now; the closest thing is
> query-time lookups, which is like a broadcast join. Without native join
> support in Druid, it makes more sense to pull data out of Druid into
> another system (like Drill or Hive or Spark) and do the join there. Even if
> Druid did support native joins, there's still some value in using an
> external execution engine to join Druid data with data from some other
> system. Filters and aggregations can still potentially be pushed down,
> depending on the query.
> >
> > 7) JDBC works, but what about ODBC?
> >
> > Avatica's home page says work on an ODBC client has not yet started. The
> page at https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/ <
> https://hortonworks.com/hadoop-tutorial/bi-apache-phoenix-odbc/> is
> interesting, since the Phoenix Query Server is also Avatica based, so maybe
> that work could be useful? However, it doesn't seem to be open source, and
> when I tried to get the binary to work, the Windows ODBC setup thing
> crashed after calling getTableTypes. Maybe someone at Hortonworks can
> comment :)
> >
> > Gian
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Druid Development" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to druid-development+unsubscr...@googlegroups.com <mailto:
> druid-development+unsubscr...@googlegroups.com>.
> > To post to this group, send email to druid-developm...@googlegroups.com
> <mailto:druid-developm...@googlegroups.com>.
> > To view this discussion on the web visit https://groups.google.com/d/
> msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e-
> 3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com <https://groups.google.com/d/
> msgid/druid-development/CACZNdYAVv8WLP1Qw4dzzf60e-
> 3CwzP_%3DFmWeOd_2OvPsfcw8Ag%40mail.gmail.com?utm_medium=
> email&utm_source=footer>.
> > For more options, visit https://groups.google.com/d/optout <
> https://groups.google.com/d/optout>.
>
>

Re: [druid-dev] [Proposal] Built-in SQL for Druid

Reply via email to