Re: [sqlite] Order of columns in group by statement affects query performance

Hick Gunter Fri, 26 Apr 2013 02:12:37 -0700

Do you have any experience with SQLite virtual tables? I guess not.

There are 20 issues here:

1) The abstract problem of choosing an Index for optimizing GROUP BY

2) the SQLite implementation (which I was referring to)

Ad 1)

Any index that covers all the GROUP BY fields is a "good" index because it 
allows aggregates to be computed "on the fly" as opposed to in a temporary 
table.

ORDER BY clause and multiple indices may complicate the matter

Ad 2)

SQLite attempts to handle virtual tables the same as native tables (which is 
one of the main reasons we chose SQLite). The VT interface does not allow 
publication of indexes nor creation of native indexes on virtual tables. The 
aOrderBy table of the interface implies an ordered list of fields, therefore 
SQLite would have to call xBestIndex n! times to discover the least costly 
index to use.

It is not unreasonable to assume that in a well designed SQL Statement the 
GROUP BY clause will be backed up by the necessary index and an identical ORDER 
BY clause (at least unintentionally by the programmer virtue of laziness 
resulting in copy-paste of the field list).

Thus the aOrderBy array being used for ORDER BY and GROUP BY in the VT 
interface.

http://www.sqlite.org/vtab.html

2.3 The xBestIndex Method
SQLite uses the xBestIndex method of a virtual table module to determine the 
best way to access the virtual table. The xBestIndex method has a prototype 
like this:

  int (*xBestIndex)(sqlite3_vtab *pVTab, sqlite3_index_info*);

...

Before calling this method, the SQLite core initializes an instance of the 
sqlite3_index_info structure with information about the query that it is 
currently trying to process. This information derives mainly from the WHERE 
clause and **ORDER BY or GROUP BY** clauses of the query, but also from any ON 
or USING clauses if the query is a join.

-----Ursprüngliche Nachricht-----
Von: James K. Lowden [mailto:jklow...@schemamania.org]
Gesendet: Donnerstag, 25. April 2013 16:34
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Order of columns in group by statement affects query 
performance

On Thu, 25 Apr 2013 10:29:34 +0200
Hick Gunter <h...@scigames.at> wrote:

> AFAIK SQLite treats GROUP BY the same way as ORDER BY (taken from
> hints in the virtual table description).

That might be so, in some limited sense.  It's obviously false in general 
because they mean different things and have different effects.

> If you have an index that covers the GROUP BY clause in field order,
> then aggregate functions need store only the current value; if not,
> then you need an ephemeral table to hold the aggregate values.

Nonsense.  The query parser sees GROUP BY A,B.  The optimizer sees an index 
ordered B,A.  By permuting the order of the columns in the GROUP BY clause, it 
finds a match for the index and uses it.

Yes, the problem is O(n^2), where n is the number of columns in the GROUP BY, 
but n is always small; even 7 columns could be checked in less than 50 
iterations.

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

--------------------------------------------------------------------------
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then delete this message from 
your system. Please do not copy it or use it for any purposes, or disclose its 
contents to any person as to do so could be a breach of confidence. Thank you 
for your cooperation.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Order of columns in group by statement affects query performance

Reply via email to