Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-15 Thread Jean-Baptiste Gardette

Thank you Keith for the detail explanation.
I misunderstood the 2 replies were opposite but this is not the case.

Thank you again
Jean-bapstiste
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-15 Thread Keith Medcalf

On Wednesday, 15 January, 2020 02:06, Jean-Baptiste Gardette 
 wrote:

> Just to be sure, is it unsafe to write a non agregate SELECT with GROUP
> BY and HAVING clauses (without sub-SELECT) for the sole prupose 
> explained before (even if the approache is discutable) ?

Presently, yes it is.

>I understand 2 different answers here :

>- "No, this kind of query can't be rewritten by the optimizer for the
>technical reasons (VDBE, index etc)"

This is presently the case.  Current versions of the query planner will not 
optimize these queries but will instead execute them as written.  That is 
statements of the form "select ... from ... where ... group by ... having ... 
order by ... limit ... offset ..." will not convert the "group by" into an 
"order by" or the "having" into "where".  The provers required to allow this 
sort of transformation do not exist at present.

>- "Yes it is unsafe, a future version of SQLite may optimize differently
>this kind of query"

Yes.  A future version of SQLite may indeed process the query by turning the 
"group by" into an "order by" (if there isn't one, or just ignoring it if there 
is an order by, or perhaps merging them) and moving the "having" to a "where" 
condition.  Doing this would require that the optimizer recognize that the 
group by expression can only result in single row groups and that neither the 
select list nor the having expression contain aggregate functions.  There is 
almost nothing to be gained from this optimization, however, so it is highly 
unlikely that such provers would be written in order to implement this 
particular optimization.

Contrast this with recent optimizations that have been added, for example, the 
LEFT JOIN optimization which downgrades an outer join into an inner join if it 
can be proved that the overall result will be the same (the extra candidates 
generated by the outer join will be removed from the result set by a where 
clause, so going to all the bother of adding and processing them in the first 
place serves no purpose), or that leaf tables which are not referenced in the 
select list are removed from the query since generating those results merely 
incurs a cost to no effect.  These optimizations can have a significant impact 
on performance.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume. 



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-15 Thread Jean-Baptiste Gardette
Just to be sure, is it unsafe to write a non agregate SELECT with GROUP 
BY and HAVING clauses (without sub-SELECT)
for the sole prupose explained before (even if the approache is 
discutable) ?


I understand 2 different answers here :

- "No, this kind of query can't be rewritten by the optimizer for the 
technical reasons (VDBE, index etc)"


- "Yes it is unsafe, a future version of SQLite may optimize differently 
this kind of query"


I can't see something related to my problem in the doc 
https://sqlite.org/optoverview.html

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Richard Hipp
On 1/14/20, Keith Medcalf  wrote:
>
> I seem to recall something about "expensive" conditions that will be forced
> to be run on only as few surviving candidate rows as possible, but my
> recollection is vague (they say the memory is the second thing to go --
> strange I can't remember the first).
>
> Anyway, Richard may be able to help here.

Maybe you are thinking of SQLITE_ENABLE_SORTER_REFERENCES.
https://www.sqlite.org/compile.html#enable_sorter_references


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Keith Medcalf

On Tuesday, 14 January, 2020 09:23, Simon Slavin  wrote:

>Would it be possible to phrase your SELECT as a SELECT with a sub-SELECT
>?  Have the sub-SELECT figure out which rows you want in which order,
>then use a SELECT to apply your UDF to them ?  It is guaranteed that the
>sub-SELECT is processed before the SELECT.

you mean if you do something like this:

select *
  from (
 select *
   from ...
  where ...
   )
 where ...

?

If the query flattener is operating the optimizer will push the outer where 
condition into the inner query after which there is no guarantee that the 
conditions in the outer query will be processed *after* the conditions in the 
inner query, especially not if the pushed terms are estimated to reduce the 
number of candidates "quicker" by executing it sooner.

I seem to recall something about "expensive" conditions that will be forced to 
be run on only as few surviving candidate rows as possible, but my recollection 
is vague (they say the memory is the second thing to go -- strange I can't 
remember the first).

Anyway, Richard may be able to help here.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Simon Slavin
On 14 Jan 2020, at 4:14pm, Jean-Baptiste Gardette  wrote:

> The reason i asked this is that i have a query in wich one condition 
> filtering the recordset involves
> an UDF and this UDF needs to be processed after all table filters have been 
> applied

You cannot guarantee this.  And even if you find a solution that works now, the 
optimizer in a future version of SQLite might work differently.

Would it be possible to phrase your SELECT as a SELECT with a sub-SELECT ?  
Have the sub-SELECT figure out which rows you want in which order, then use a 
SELECT to apply your UDF to them ?  It is guaranteed that the sub-SELECT is 
processed before the SELECT.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Jean-Baptiste Gardette

Thank you Dominic and Keith for your replies

The reason i asked this is that i have a query in wich one condition 
filtering the recordset involves
an UDF and this UDF needs to be processed after all table filters have 
been applied


Illustration :

additionnal table :

CREATE TABLE t2 (
a TEXT PRIMARY KEY,
b INTEGER,
c REAL);

UDF :
proc CallUDF {key} {
   if{[dict exists $myDict $key]} {
  dict get $myDict $key  /* returns a value stored in myDict */
   } else {error}
}
db function CallUDF -deterministic CallUDF

If i do :

SELECT *
FROM t1 CROSS JOIN t2
WHERE t1.b > 10
  AND t2.a = t1.a and t2.c < 0
  AND CallUDF(t2.a) <> 0

Even though "CROSS JOIN" garanties that t2 is traversed after t1 has 
been filtered on "t1.b > 10",
it seems there is no garanty when t2 is traversed, that the filter 
CallUDF(t2.a) <> 0 is evaluated *after*

filters "t2.b = t1.b and t2.c < 0".

In this case the sole solution we found is :

SELECT *
FROM t1, t2
WHERE t1.b > 10
  AND t2.a = t1.a and t2.c < 0
GROUP BY t2.a
HAVING CallUDF(t2.a) <> 0

Thank again
Jean-Baptiste
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Keith Medcalf

On Tuesday, 14 January, 2020 06:58, Jean-Baptiste Gardette 
 wrote:

>Consider the following exemple :

>CREATE TABLE t1 (
>a TEXT PRIMARY KEY,
>b INTEGER);

>SELECT *
>FROM t1
>GROUP BY a
>HAVING b > 1;

>Will the GROUP BY clause be supressed and HAVING clause be rewritten in
>WHERE clause by the optimizer ?

No.  The VDBE code will be generated as you have specified.  The "group by a" 
will be used to select the index to be used to access the table data, and at 
the end of the processing of each group of records, the "having b > 1" will be 
applied to the group to determine whether the group is output.

The PRIMARY KEY constraint on a does not make "a" unique since a is not 
constrained not null, so there may be multiple records in the same group.

Even if "a" is constrained not null the query is still processed as a group 
by/having even though each group can only consist of one record even though 
(and only in that case) the query equivalent to "select * from t1 where b > 1 
order by a".

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query Planner GROUP BY and HAVING clauses optimization ?

2020-01-14 Thread Dominique Devienne
On Tue, Jan 14, 2020 at 2:57 PM Jean-Baptiste Gardette
 wrote:
> SELECT * FROM t1 GROUP BY a HAVING b > 1;
>
> Will the GROUP BY clause be supressed and HAVING clause be rewritten in WHERE 
> clause by the optimizer ?

My question would be why you wouldn't write it as a WHERE clause in
the first place :)
Sorry, OT, I know. --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users