Stamatis Zampetakis created CALCITE-4202:
--------------------------------------------

             Summary: Refine Druid cost-model to capture differences in 
intermediate projections 
                 Key: CALCITE-4202
                 URL: https://issues.apache.org/jira/browse/CALCITE-4202
             Project: Calcite
          Issue Type: Improvement
          Components: druid-adapter
            Reporter: Stamatis Zampetakis


The planner generates equivalent DruidQuery expressions with exactly the same 
cost. Most of the time the expressions differ only in the number of 
intermediate projections 

For example, running the following query

{code:sql}
select distinct "countryName"
from "wiki"
where "page" = 'Jeremy Corbyn'
{code}

 via {{DruidAdapterIT#testSelectDistinctWiki}} generates among others the 
following alternatives during optimization.

+Choice 1+  
{noformat}
rel#184:DruidQuery.BINDABLE.[](table=[wiki, 
wiki],intervals=[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z],filter==($13,
 'Jeremy Corbyn'),projects=[$5, $13],groups={0},aggs=[])
{noformat}

+Choice 2+
{noformat}
rel#108:DruidQuery.BINDABLE.[](table=[wiki, 
wiki],intervals=[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z],filter==($13,
 'Jeremy Corbyn'),projects=[$5],groups={0},aggs=[])
{noformat}

Using the debugger we can see that the cost of the two plans is exactly the 
same (although they are different) which means that the one that was generated 
first will dominate the other. Clearly in this case the second choice is a 
better plan. 

Performance wise the difference may not be that big but refining the cost is 
beneficial at least for plan stability. Currently the final plan is dependent 
on the order that the rules are applied.

The goal of this jira is to refine Druid's cost model so that choice 2 becomes 
cheaper than choice 1 outlined above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to