I'm following up on my "Optimizing `SELECT a, max(b) GROUP BY a`" thread from a 
few weeks ago, rephrasing it as a clearer enhancement request.

ACTUAL BEHAVIOR: A query of the form `SELECT a, max(b) GROUP BY a` runs slowly 
(O(n) with the number of table rows), even if there is an index on (a, b DESC). 
The query plan explanation says "SCAN TABLE ... USING INDEX". This is in SQLite 
3.28.

EXPECTED BEHAVIOR: Query runs faster :-) My big-O fu is not strong enough to 
express it that way, but I'd imagine it to be proportional to the number of 
distinct `a` values, not the number of rows in the table.

DIAGNOSIS: According to Keith Medcalf, "it appears that the optimizer will not 
utilize a skip-scan *AND* apply the max optimization concurrently."

According to Keith, a workaround is to rewrite the query as
        select name, 
      (
       select max(timestamp) 
         from table 
        where name=outer.name
      ) 
 from (
       select distinct name
         from table
      );

This is of course a lot more complex. And unfortunately in my case the query 
generator my program uses does not (yet) have the capability to generate nested 
SELECTs, so the optimization is unavailable to me until/unless we implement 
that.

—Jens
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to