[jira] [Commented] (CALCITE-1933) Cost estimation for expression evaluate

2017-09-07 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157839#comment-16157839
 ] 

Julian Hyde commented on CALCITE-1933:
--

Sorry for the delayed reply; somehow I mislaid this message.

Yes, I think this would be great to add.

Can you explain the purpose of the {{predicate}} argument? I think I understand 
the rest of it.

There is a related issue that I will need to solve that crops up in 
CALCITE-1861 (spatial indexes). I wonder if you have any advice, or whether it 
affects this proposal. Given an expensive filter such as {code}WHERE 
ST_DWithin(x, y, ST_Point(10, 20), 3){code} (let's call it "f1"), we would like 
to add an approximate pre-filter (call it "f2"), converting the condition into 
{code}WHERE (h BETWEEN 10 AND 13 OR h BETWEEN 17 AND 19)
AND ST_DWithin(x, y, ST_Point(10, 20), 3){code}

The pre-filter f2 is cheaper than the regular filter f1 because ({{BETWEEN}} 
uses less CPU than {{ST_DWithin}}, and exploits sorting so will inspect many 
fewer rows) but it is approximate. Also it affects the selectivity of the 
regular filter; or, put another way, the selectivity of (f2 AND f1) is the same 
as the selectivity of f1. To model the cost effectively, we need to represent 
the fact that f1 is being applied to many fewer rows after f2 has been applied. 
Will your proposal handle this?

> Cost estimation for expression evaluate
> ---
>
> Key: CALCITE-1933
> URL: https://issues.apache.org/jira/browse/CALCITE-1933
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Ted Xu
>Assignee: Julian Hyde
>
> Expression evaluate performance is different one to another based on its 
> operator type and underlying execution engine. For example, a regex pattern 
> matching (e.g., col like 'foo%bar%') may requires 100 times more cost 
> compares to a simple comparison (e.g., col > 3). Currently we don't have this 
> kind of differentiation.
> Expression cost will grant following applications:
> 1. Improved predicate push down: predicates can be pushed down to the right 
> position based on row count and its own cost.
> 2. Predicate ordering rule: predicate with different order, e.g.,
> {code:sql}
> col1 like 'foo%bar%' and col2 > 3
> col2 > 3 and col1 like 'foo%bar%'
> {code}
>  
> may have different performance, introducing a predicate ordering rule will 
> help find a better one.
> The cost of each RexCall / RexAggCall is provided by Schema, stored in an 
> external metadata service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1933) Cost estimation for expression evaluate

2017-08-12 Thread Ted Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124507#comment-16124507
 ] 

Ted Xu commented on CALCITE-1933:
-

Thanks [~julianhyde] for the reply.

You're right, expression cost can be provided as metadata. In fact, I was 
proposing to use Schema because we treat expression cost as SqlOperator 
attribute _in our own implementation_. This makes users' hint (settings in 
metadata service or annotations on UDF code) injection more convenient. Here we 
say expression cost we mean _expression complexity_, the underlying concept 
includes:

1. Certain cost of a call to given SqlOperator is a constant
2. The cost is irrelevant to the input data

On second thought, I found above assumptions was too simple. The cost or 
complexity of expression may depend on:

1. Data types (e.g., cost of cast from string to double and integer to double 
is different)
2. Input data value, distribution, ordering, etc. (e.g., like '%foobar%' and 
like '%foo%bar%' is different)
3. Execution engine implementation details, like caching\[1\], code generation 
and vectorized execution.

If we treat expression cost as metadata, the interface could be simplified as 
follows:

{code:java}
interface Handler extends MetadataHandler {
  Double getExpressionComplexity(RelNode r, RelMetadataQuery mq,
  RexNode expression, RexNode predicate);
}
{code}

It looks not necessarily like a 'core' functionality and makes no much 
difference to NonCummulativeCost itself. But still it would help estimating 
filter cost.

I'm not sure if I explained my idea clearly. What do you think? Shall we 
introduce such a metadata?

\[1\] Stonebraker, M., & Hellerstein, J. M. (n.d.). Optimizing Predicate 
Migration : Queries with Expensive Predicates, 267–276.


> Cost estimation for expression evaluate
> ---
>
> Key: CALCITE-1933
> URL: https://issues.apache.org/jira/browse/CALCITE-1933
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Ted Xu
>Assignee: Julian Hyde
>
> Expression evaluate performance is different one to another based on its 
> operator type and underlying execution engine. For example, a regex pattern 
> matching (e.g., col like 'foo%bar%') may requires 100 times more cost 
> compares to a simple comparison (e.g., col > 3). Currently we don't have this 
> kind of differentiation.
> Expression cost will grant following applications:
> 1. Improved predicate push down: predicates can be pushed down to the right 
> position based on row count and its own cost.
> 2. Predicate ordering rule: predicate with different order, e.g.,
> {code:sql}
> col1 like 'foo%bar%' and col2 > 3
> col2 > 3 and col1 like 'foo%bar%'
> {code}
>  
> may have different performance, introducing a predicate ordering rule will 
> help find a better one.
> The cost of each RexCall / RexAggCall is provided by Schema, stored in an 
> external metadata service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1933) Cost estimation for expression evaluate

2017-08-09 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120061#comment-16120061
 ] 

Julian Hyde commented on CALCITE-1933:
--

Why are you providing it via Schema? I think this should be a new kind of 
planner metadata, via RelMetadataQuery. There is already a mechanism to add 
providers. 

> Cost estimation for expression evaluate
> ---
>
> Key: CALCITE-1933
> URL: https://issues.apache.org/jira/browse/CALCITE-1933
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Ted Xu
>Assignee: Julian Hyde
>
> Expression evaluate performance is different one to another based on its 
> operator type and underlying execution engine. For example, a regex pattern 
> matching (e.g., col like 'foo%bar%') may requires 100 times more cost 
> compares to a simple comparison (e.g., col > 3). Currently we don't have this 
> kind of differentiation.
> Expression cost will grant following applications:
> 1. Improved predicate push down: predicates can be pushed down to the right 
> position based on row count and its own cost.
> 2. Predicate ordering rule: predicate with different order, e.g.,
> {code:sql}
> col1 like 'foo%bar%' and col2 > 3
> col2 > 3 and col1 like 'foo%bar%'
> {code}
>  
> may have different performance, introducing a predicate ordering rule will 
> help find a better one.
> The cost of each RexCall / RexAggCall is provided by Schema, stored in an 
> external metadata service.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)