subject:"Re\: Embed druid\-sql inside Calcite\?"

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Gian Merlino

So, it sounds like the first thing to look at would be seeing if the Hive
folks are open to using druid-sql instead of calcite-druid. What'd be the
best way to go about that? Nishant- do you think you could help?

Gian

On Wed, Feb 7, 2018 at 3:46 PM, Julian Hyde  wrote:

> Long term there doesn’t seem to be any point keeping Calcite’s druid
> adapter around. The code would be an inferior duplicate of druid-sql, so we
> would want to
>
> But shorter term there will be quite a few things that Hive needs that
> will only exist in Calcite’s druid adapter. The challenge will be the
> transition. You will need to convince the Hive developers that the move is
> worthwhile. (It will help if you can point to some quick benefits to making
> the transition.)
>
> Julian
>
>
> > On Feb 7, 2018, at 2:59 PM, Gian Merlino  wrote:
> >
> > In the world where druid-sql is where Druid's Calcite API lives, what do
> > you think would make the most sense for the current calcite-druid module?
> > Would it make sense to remove it (and merge anything it does, that
> > druid-sql doesn't already do, into druid-sql) or to keep it in the
> Calcite
> > project but have it be a thin wrapper over druid-sql?
> >
> > I guess this should be informed by who the users of calcite-druid are. At
> > this point, I don't know much beyond the fact that Hive uses it.
> >
> > Gian
> >
> > On Wed, Feb 7, 2018 at 10:29 AM, Julian Hyde  wrote:
> >
> >> I agree with you both.
> >>
> >> For a particular engine, such as Druid, there are often 3 options:
> >>
> >> 1. build a Calcite adapter to the engine's native query language;
> >>
> >> 2. if the engine supports SQL, connect to the engine via Calcite's JDBC
> >> adapter;
> >>
> >> 3. if the engine exposes an API based on Calcite algebra, connect to
> that
> >> API.
> >>
> >> All of those options are valid for Druid right now, and 3 (Gian's
> >> proposal) is likely to yield the best plans. As Gian correctly notes,
> >> that is likely to increase the coupling, but we can live with that.
> >> (If people want loose coupling they can talk to Druid via the JDBC
> >> adapter, and we just need to make sure that the Druid JDBC dialect
> >> knows that Druid cannot do joins.)
> >>
> >> Nishant's core point seems to be that we need some kind of bulk
> >> API/protocol to talk to Druid, to consume partial query results in
> >> parallel. This is desirable because Hive is  -- how to put it
> >> politely?! -- a "bigger" query engine. I'm sure that Spark, Presto and
> >> Drill would want a similar API/protocol. When it exists, we can
> >> generate a hybrid plan: Druid physical algebra that generates partial
> >> results in parallel underneath Hive physical algebra that consumes
> >> those results in parallel.
> >>
> >> The same pattern occurred in Phoenix. Phoenix does not have
> >> shuffle/exchange capabilities, so for big analytic queries we would
> >> want to couple it with Hive/Spark/Presto/Drill. We talked about
> >> Drillix (Drill + Phoenix) for a while but never completed it.
> >>
> >> Julian
> >>
> >>
> >> On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa
> >>  wrote:
> >>> Having a focused effort into a single project would be great and would
> >>> definitely help us in evolving druid sql capabilities faster.
> >>>
> >>> 1) One more thing that we need to consider here is that calcite
> >>> druid-adapter is also used in Apache Hive where we use the druid rules
> to
> >>> generate an optimized plan and then the druid query is executed from
> >> druid
> >>> containers. In druid-sql I believe the query execution logic is tied to
> >> the
> >>> fact that execution node is a druid-broker where native queries can be
> >> run
> >>> to generate a Sequence of results. We might need some rework there to
> >>> ensure that things work fine with hive too after proposed changes.
> >>>
> >>> 2) druid-sql dependencies can probably be reduced by separating the
> >>> planning and execution logic in druid-sql, the planning logic need not
> >>> depend on lots of druid code and can have light-weight dependencies
> while
> >>> the execution part and result serde which pulls in lots of druid
> >>> dependencies can reside in separate module and calcite druid-adapter
> need
> >>> not depend on that module.
> >>>
> >>> I think, the hypothetical case you mentioned is also worth considering,
> >> to
> >>> ease up the development process, we can consider moving calcite-druid
> as
> >> a
> >>> module in druid, so that we make release of both druid-sql and
> >>> calcite-adapter together.
> >>>
> >>> On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:
> >>>
>  Hi Calcites,
> 
>  I would like to raise the idea of adding druid-sql (
> 
>  http://search.maven.org/#artifactdetails%7Cio.druid%
> >> 7Cdruid-sql%7C0.11.0%7Cjar
>  )
>  as a dependency in Calcite's Druid adapter. It should reduce the size
> of
>  calcite-druid

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Julian Hyde

Long term there doesn’t seem to be any point keeping Calcite’s druid adapter 
around. The code would be an inferior duplicate of druid-sql, so we would want 
to 

But shorter term there will be quite a few things that Hive needs that will 
only exist in Calcite’s druid adapter. The challenge will be the transition. 
You will need to convince the Hive developers that the move is worthwhile. (It 
will help if you can point to some quick benefits to making the transition.)

Julian


> On Feb 7, 2018, at 2:59 PM, Gian Merlino  wrote:
> 
> In the world where druid-sql is where Druid's Calcite API lives, what do
> you think would make the most sense for the current calcite-druid module?
> Would it make sense to remove it (and merge anything it does, that
> druid-sql doesn't already do, into druid-sql) or to keep it in the Calcite
> project but have it be a thin wrapper over druid-sql?
> 
> I guess this should be informed by who the users of calcite-druid are. At
> this point, I don't know much beyond the fact that Hive uses it.
> 
> Gian
> 
> On Wed, Feb 7, 2018 at 10:29 AM, Julian Hyde  wrote:
> 
>> I agree with you both.
>> 
>> For a particular engine, such as Druid, there are often 3 options:
>> 
>> 1. build a Calcite adapter to the engine's native query language;
>> 
>> 2. if the engine supports SQL, connect to the engine via Calcite's JDBC
>> adapter;
>> 
>> 3. if the engine exposes an API based on Calcite algebra, connect to that
>> API.
>> 
>> All of those options are valid for Druid right now, and 3 (Gian's
>> proposal) is likely to yield the best plans. As Gian correctly notes,
>> that is likely to increase the coupling, but we can live with that.
>> (If people want loose coupling they can talk to Druid via the JDBC
>> adapter, and we just need to make sure that the Druid JDBC dialect
>> knows that Druid cannot do joins.)
>> 
>> Nishant's core point seems to be that we need some kind of bulk
>> API/protocol to talk to Druid, to consume partial query results in
>> parallel. This is desirable because Hive is  -- how to put it
>> politely?! -- a "bigger" query engine. I'm sure that Spark, Presto and
>> Drill would want a similar API/protocol. When it exists, we can
>> generate a hybrid plan: Druid physical algebra that generates partial
>> results in parallel underneath Hive physical algebra that consumes
>> those results in parallel.
>> 
>> The same pattern occurred in Phoenix. Phoenix does not have
>> shuffle/exchange capabilities, so for big analytic queries we would
>> want to couple it with Hive/Spark/Presto/Drill. We talked about
>> Drillix (Drill + Phoenix) for a while but never completed it.
>> 
>> Julian
>> 
>> 
>> On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa
>>  wrote:
>>> Having a focused effort into a single project would be great and would
>>> definitely help us in evolving druid sql capabilities faster.
>>> 
>>> 1) One more thing that we need to consider here is that calcite
>>> druid-adapter is also used in Apache Hive where we use the druid rules to
>>> generate an optimized plan and then the druid query is executed from
>> druid
>>> containers. In druid-sql I believe the query execution logic is tied to
>> the
>>> fact that execution node is a druid-broker where native queries can be
>> run
>>> to generate a Sequence of results. We might need some rework there to
>>> ensure that things work fine with hive too after proposed changes.
>>> 
>>> 2) druid-sql dependencies can probably be reduced by separating the
>>> planning and execution logic in druid-sql, the planning logic need not
>>> depend on lots of druid code and can have light-weight dependencies while
>>> the execution part and result serde which pulls in lots of druid
>>> dependencies can reside in separate module and calcite druid-adapter need
>>> not depend on that module.
>>> 
>>> I think, the hypothetical case you mentioned is also worth considering,
>> to
>>> ease up the development process, we can consider moving calcite-druid as
>> a
>>> module in druid, so that we make release of both druid-sql and
>>> calcite-adapter together.
>>> 
>>> On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:
>>> 
 Hi Calcites,
 
 I would like to raise the idea of adding druid-sql (
 
 http://search.maven.org/#artifactdetails%7Cio.druid%
>> 7Cdruid-sql%7C0.11.0%7Cjar
 )
 as a dependency in Calcite's Druid adapter. It should reduce the size of
 calcite-druid substantially, since it would mostly just be calling into
 druid-sql.
 
 This has some advantages for both projects.
 
 1) Support for new Druid features often appears in Druid SQL first. By
 embedding druid-sql, Calcite gets these new features too, without extra
 work. For example https://issues.apache.org/jira/browse/CALCITE-2170
>> is an
 outstanding jira to add support for Druid expressions to Calcite, but
 druid-sql already supports these. In

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Gian Merlino

In the world where druid-sql is where Druid's Calcite API lives, what do
you think would make the most sense for the current calcite-druid module?
Would it make sense to remove it (and merge anything it does, that
druid-sql doesn't already do, into druid-sql) or to keep it in the Calcite
project but have it be a thin wrapper over druid-sql?

I guess this should be informed by who the users of calcite-druid are. At
this point, I don't know much beyond the fact that Hive uses it.

Gian

On Wed, Feb 7, 2018 at 10:29 AM, Julian Hyde  wrote:

> I agree with you both.
>
> For a particular engine, such as Druid, there are often 3 options:
>
> 1. build a Calcite adapter to the engine's native query language;
>
> 2. if the engine supports SQL, connect to the engine via Calcite's JDBC
> adapter;
>
> 3. if the engine exposes an API based on Calcite algebra, connect to that
> API.
>
> All of those options are valid for Druid right now, and 3 (Gian's
> proposal) is likely to yield the best plans. As Gian correctly notes,
> that is likely to increase the coupling, but we can live with that.
> (If people want loose coupling they can talk to Druid via the JDBC
> adapter, and we just need to make sure that the Druid JDBC dialect
> knows that Druid cannot do joins.)
>
> Nishant's core point seems to be that we need some kind of bulk
> API/protocol to talk to Druid, to consume partial query results in
> parallel. This is desirable because Hive is  -- how to put it
> politely?! -- a "bigger" query engine. I'm sure that Spark, Presto and
> Drill would want a similar API/protocol. When it exists, we can
> generate a hybrid plan: Druid physical algebra that generates partial
> results in parallel underneath Hive physical algebra that consumes
> those results in parallel.
>
> The same pattern occurred in Phoenix. Phoenix does not have
> shuffle/exchange capabilities, so for big analytic queries we would
> want to couple it with Hive/Spark/Presto/Drill. We talked about
> Drillix (Drill + Phoenix) for a while but never completed it.
>
> Julian
>
>
> On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa
>  wrote:
> > Having a focused effort into a single project would be great and would
> > definitely help us in evolving druid sql capabilities faster.
> >
> > 1) One more thing that we need to consider here is that calcite
> > druid-adapter is also used in Apache Hive where we use the druid rules to
> > generate an optimized plan and then the druid query is executed from
> druid
> > containers. In druid-sql I believe the query execution logic is tied to
> the
> > fact that execution node is a druid-broker where native queries can be
> run
> > to generate a Sequence of results. We might need some rework there to
> > ensure that things work fine with hive too after proposed changes.
> >
> > 2) druid-sql dependencies can probably be reduced by separating the
> > planning and execution logic in druid-sql, the planning logic need not
> > depend on lots of druid code and can have light-weight dependencies while
> > the execution part and result serde which pulls in lots of druid
> > dependencies can reside in separate module and calcite druid-adapter need
> > not depend on that module.
> >
> > I think, the hypothetical case you mentioned is also worth considering,
> to
> > ease up the development process, we can consider moving calcite-druid as
> a
> > module in druid, so that we make release of both druid-sql and
> > calcite-adapter together.
> >
> > On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:
> >
> >> Hi Calcites,
> >>
> >> I would like to raise the idea of adding druid-sql (
> >>
> >> http://search.maven.org/#artifactdetails%7Cio.druid%
> 7Cdruid-sql%7C0.11.0%7Cjar
> >> )
> >> as a dependency in Calcite's Druid adapter. It should reduce the size of
> >> calcite-druid substantially, since it would mostly just be calling into
> >> druid-sql.
> >>
> >> This has some advantages for both projects.
> >>
> >> 1) Support for new Druid features often appears in Druid SQL first. By
> >> embedding druid-sql, Calcite gets these new features too, without extra
> >> work. For example https://issues.apache.org/jira/browse/CALCITE-2170
> is an
> >> outstanding jira to add support for Druid expressions to Calcite, but
> >> druid-sql already supports these. In fact it looks like some of the
> code in
> >> the proposed patch is copied from druid-sql. As another example,
> >> https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
> >> from "select" to "scan", which had been previously done in Druid SQL in
> >> https://github.com/druid-io/druid/pull/4751.
> >>
> >> 2) Depending on druid-sql means Calcite doesn't need to implement its
> own
> >> Druid query and result serde code. Druid already has it.
> >>
> >> 3) Focused effort on a single module rather than the split effort that
> we
> >> have today, where some developers are contributing to druid-sql and some
> >> are

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Julian Hyde

I agree with you both.

For a particular engine, such as Druid, there are often 3 options:

1. build a Calcite adapter to the engine's native query language;

2. if the engine supports SQL, connect to the engine via Calcite's JDBC adapter;

3. if the engine exposes an API based on Calcite algebra, connect to that API.

All of those options are valid for Druid right now, and 3 (Gian's
proposal) is likely to yield the best plans. As Gian correctly notes,
that is likely to increase the coupling, but we can live with that.
(If people want loose coupling they can talk to Druid via the JDBC
adapter, and we just need to make sure that the Druid JDBC dialect
knows that Druid cannot do joins.)

Nishant's core point seems to be that we need some kind of bulk
API/protocol to talk to Druid, to consume partial query results in
parallel. This is desirable because Hive is  -- how to put it
politely?! -- a "bigger" query engine. I'm sure that Spark, Presto and
Drill would want a similar API/protocol. When it exists, we can
generate a hybrid plan: Druid physical algebra that generates partial
results in parallel underneath Hive physical algebra that consumes
those results in parallel.

The same pattern occurred in Phoenix. Phoenix does not have
shuffle/exchange capabilities, so for big analytic queries we would
want to couple it with Hive/Spark/Presto/Drill. We talked about
Drillix (Drill + Phoenix) for a while but never completed it.

Julian


On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa
 wrote:
> Having a focused effort into a single project would be great and would
> definitely help us in evolving druid sql capabilities faster.
>
> 1) One more thing that we need to consider here is that calcite
> druid-adapter is also used in Apache Hive where we use the druid rules to
> generate an optimized plan and then the druid query is executed from druid
> containers. In druid-sql I believe the query execution logic is tied to the
> fact that execution node is a druid-broker where native queries can be run
> to generate a Sequence of results. We might need some rework there to
> ensure that things work fine with hive too after proposed changes.
>
> 2) druid-sql dependencies can probably be reduced by separating the
> planning and execution logic in druid-sql, the planning logic need not
> depend on lots of druid code and can have light-weight dependencies while
> the execution part and result serde which pulls in lots of druid
> dependencies can reside in separate module and calcite druid-adapter need
> not depend on that module.
>
> I think, the hypothetical case you mentioned is also worth considering, to
> ease up the development process, we can consider moving calcite-druid as a
> module in druid, so that we make release of both druid-sql and
> calcite-adapter together.
>
> On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:
>
>> Hi Calcites,
>>
>> I would like to raise the idea of adding druid-sql (
>>
>> http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-sql%7C0.11.0%7Cjar
>> )
>> as a dependency in Calcite's Druid adapter. It should reduce the size of
>> calcite-druid substantially, since it would mostly just be calling into
>> druid-sql.
>>
>> This has some advantages for both projects.
>>
>> 1) Support for new Druid features often appears in Druid SQL first. By
>> embedding druid-sql, Calcite gets these new features too, without extra
>> work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is an
>> outstanding jira to add support for Druid expressions to Calcite, but
>> druid-sql already supports these. In fact it looks like some of the code in
>> the proposed patch is copied from druid-sql. As another example,
>> https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
>> from "select" to "scan", which had been previously done in Druid SQL in
>> https://github.com/druid-io/druid/pull/4751.
>>
>> 2) Depending on druid-sql means Calcite doesn't need to implement its own
>> Druid query and result serde code. Druid already has it.
>>
>> 3) Focused effort on a single module rather than the split effort that we
>> have today, where some developers are contributing to druid-sql and some
>> are contributing to calcite-druid.
>>
>> 4) More test coverage for both projects, presumably.
>>
>> I think (3) and (4) especially would give us the opportunity to improve
>> both projects much more rapidly.
>>
>> However, there are also some possible disadvantages.
>>
>> 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
>> Druid code. Calcite users may prefer a lighter weight module.
>>
>> 2) druid-sql's APIs are not intended to be stable, and probably never will
>> be. They may break on minor releases. So updating the version of druid-sql
>> in Calcite may involve tweaking how functions are called, etc. I think this
>> effort should be minimal if calcite-druid is mostly just delegating to
>> druid-sql.
>>
>> 3) druid-sql depends on

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Gian Merlino

I think druid-sql could support the Hive use case without too much
reworking. It has a method that returns a Sequence:

  public abstract Sequence runQuery();

But it also has another method that returns the Druid query, and Hive would
probably call that one:

  public DruidQuery toDruidQuery()

Additionally, I guess Hive doesn't want to push "HAVING" and "ORDER BY"
down to Druid, so it should avoid adding those rules. There is enough
flexibility in druid-sql for that (push down of where, group by, having,
and order by all implemented as separate rules).

About reducing dependencies -- it would be tough, since druid-sql's
planning logic also uses Druid model classes (like ExtractionFn, Query,
etc) as part of its rules, and so it depends on druid-processing pretty
deeply. Hopefully that would be acceptable to current users of
calcite-druid. I think it does have a big advantage: by using Druid's own
model classes, there is no need to implement serde and query validation
twice.

> I think, the hypothetical case you mentioned is also worth considering, to
> ease up the development process, we can consider moving calcite-druid as a
> module in druid, so that we make release of both druid-sql and
> calcite-adapter together.

By this: do you mean you're considering removing calcite-druid altogether?
So, if someone wants to use Calcite with Druid, they should depend on
druid-sql (or druid-calcite or whatever) rather than calcite-druid?

Gian

On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa 
wrote:

> Having a focused effort into a single project would be great and would
> definitely help us in evolving druid sql capabilities faster.
>
> 1) One more thing that we need to consider here is that calcite
> druid-adapter is also used in Apache Hive where we use the druid rules to
> generate an optimized plan and then the druid query is executed from druid
> containers. In druid-sql I believe the query execution logic is tied to the
> fact that execution node is a druid-broker where native queries can be run
> to generate a Sequence of results. We might need some rework there to
> ensure that things work fine with hive too after proposed changes.
>
> 2) druid-sql dependencies can probably be reduced by separating the
> planning and execution logic in druid-sql, the planning logic need not
> depend on lots of druid code and can have light-weight dependencies while
> the execution part and result serde which pulls in lots of druid
> dependencies can reside in separate module and calcite druid-adapter need
> not depend on that module.
>
> I think, the hypothetical case you mentioned is also worth considering, to
> ease up the development process, we can consider moving calcite-druid as a
> module in druid, so that we make release of both druid-sql and
> calcite-adapter together.
>
> On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:
>
> > Hi Calcites,
> >
> > I would like to raise the idea of adding druid-sql (
> >
> > http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-
> sql%7C0.11.0%7Cjar
> > )
> > as a dependency in Calcite's Druid adapter. It should reduce the size of
> > calcite-druid substantially, since it would mostly just be calling into
> > druid-sql.
> >
> > This has some advantages for both projects.
> >
> > 1) Support for new Druid features often appears in Druid SQL first. By
> > embedding druid-sql, Calcite gets these new features too, without extra
> > work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is
> an
> > outstanding jira to add support for Druid expressions to Calcite, but
> > druid-sql already supports these. In fact it looks like some of the code
> in
> > the proposed patch is copied from druid-sql. As another example,
> > https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
> > from "select" to "scan", which had been previously done in Druid SQL in
> > https://github.com/druid-io/druid/pull/4751.
> >
> > 2) Depending on druid-sql means Calcite doesn't need to implement its own
> > Druid query and result serde code. Druid already has it.
> >
> > 3) Focused effort on a single module rather than the split effort that we
> > have today, where some developers are contributing to druid-sql and some
> > are contributing to calcite-druid.
> >
> > 4) More test coverage for both projects, presumably.
> >
> > I think (3) and (4) especially would give us the opportunity to improve
> > both projects much more rapidly.
> >
> > However, there are also some possible disadvantages.
> >
> > 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
> > Druid code. Calcite users may prefer a lighter weight module.
> >
> > 2) druid-sql's APIs are not intended to be stable, and probably never
> will
> > be. They may break on minor releases. So updating the version of
> druid-sql
> > in Calcite may involve tweaking how functions are called, etc. I think
> this
> > effort should be minimal if calcite-druid

Re: Embed druid-sql inside Calcite?

2018-02-07 Thread Nishant Bangarwa

Having a focused effort into a single project would be great and would
definitely help us in evolving druid sql capabilities faster.

1) One more thing that we need to consider here is that calcite
druid-adapter is also used in Apache Hive where we use the druid rules to
generate an optimized plan and then the druid query is executed from druid
containers. In druid-sql I believe the query execution logic is tied to the
fact that execution node is a druid-broker where native queries can be run
to generate a Sequence of results. We might need some rework there to
ensure that things work fine with hive too after proposed changes.

2) druid-sql dependencies can probably be reduced by separating the
planning and execution logic in druid-sql, the planning logic need not
depend on lots of druid code and can have light-weight dependencies while
the execution part and result serde which pulls in lots of druid
dependencies can reside in separate module and calcite druid-adapter need
not depend on that module.

I think, the hypothetical case you mentioned is also worth considering, to
ease up the development process, we can consider moving calcite-druid as a
module in druid, so that we make release of both druid-sql and
calcite-adapter together.

On Wed, 7 Feb 2018 at 09:02 Gian Merlino  wrote:

> Hi Calcites,
>
> I would like to raise the idea of adding druid-sql (
>
> http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-sql%7C0.11.0%7Cjar
> )
> as a dependency in Calcite's Druid adapter. It should reduce the size of
> calcite-druid substantially, since it would mostly just be calling into
> druid-sql.
>
> This has some advantages for both projects.
>
> 1) Support for new Druid features often appears in Druid SQL first. By
> embedding druid-sql, Calcite gets these new features too, without extra
> work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is an
> outstanding jira to add support for Druid expressions to Calcite, but
> druid-sql already supports these. In fact it looks like some of the code in
> the proposed patch is copied from druid-sql. As another example,
> https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
> from "select" to "scan", which had been previously done in Druid SQL in
> https://github.com/druid-io/druid/pull/4751.
>
> 2) Depending on druid-sql means Calcite doesn't need to implement its own
> Druid query and result serde code. Druid already has it.
>
> 3) Focused effort on a single module rather than the split effort that we
> have today, where some developers are contributing to druid-sql and some
> are contributing to calcite-druid.
>
> 4) More test coverage for both projects, presumably.
>
> I think (3) and (4) especially would give us the opportunity to improve
> both projects much more rapidly.
>
> However, there are also some possible disadvantages.
>
> 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
> Druid code. Calcite users may prefer a lighter weight module.
>
> 2) druid-sql's APIs are not intended to be stable, and probably never will
> be. They may break on minor releases. So updating the version of druid-sql
> in Calcite may involve tweaking how functions are called, etc. I think this
> effort should be minimal if calcite-druid is mostly just delegating to
> druid-sql.
>
> 3) druid-sql depends on calcite-core. This should usually be fine, but it
> means that if calcite-core has a breaking change, then calcite-druid cannot
> update its version of druid-sql until druid-sql first updates its version
> of calcite-core.
>
> Despite these potential difficulties, I think the potential benefit means
> this is worth exploring.
>
> Finally: a hypothetical. Why not do the other way around -- have Druid add
> calcite-druid as a dependency? The main reason is that this makes the Druid
> development process awkward when a new Druid SQL feature also requires a
> new native query feature. Today, we develop the native query and SQL sides
> together. If Druid depended on calcite-druid, then we would need to develop
> the native query side first, then release it, then update Calcite's Druid
> adapter, then pull that back into Druid. Generally, just adding an extra
> rule in druid-sql wouldn't be enough, since the sorts of changes we are
> making at this point are typically more extensive than just adjusting
> rules.
>
> Gian
>

Re: Embed druid-sql inside Calcite?

Re: Embed druid-sql inside Calcite?

Re: Embed druid-sql inside Calcite?

Re: Embed druid-sql inside Calcite?

Re: Embed druid-sql inside Calcite?

Re: Embed druid-sql inside Calcite?

6 matches

Site Navigation

Mail list logo

Footer information