Re: [sqlite] Optimizing `SELECT a, max(b) GROUP BY a`

Keith Medcalf Thu, 24 Oct 2019 12:33:22 -0700

It appears that the optimizer will not utilize a skip-scan *AND* apply the max 
optimization concurrently.


select name, 
       (
        select max(timestamp) 
          from table 
         where name=outer.name
       ) 
  from (
        select distinct name
          from table
       );

The above query uses the skip-scan optimization to get the distinct "name", and 
then for each distinct "name" then uses a max optimized query to get the 
max(timestamp) value.

You need to have an index on (name, timestamp).  The max() optimization appears 
to get applied when the index is on timestamp desc even though in that case if 
may have to "scan past" nulls.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-----Original Message-----
>From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On
>Behalf Of Jens Alfke
>Sent: Thursday, 24 October, 2019 12:38
>To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org>
>Subject: [sqlite] Optimizing `SELECT a, max(b) GROUP BY a`
>
>I'm looking for an optimal way to query the maximum value of one
>(integer) column as grouped by another (string) column. For example, say
>I've got a table of timestamped data points from many named sources, and
>I want to find each source's latest timestamp. I'm seeing decent
>performance if the database is hot in the filesystem cache, but not when
>it's cold — worst case this query takes about 7sec to run on a 2018
>MacBook Pro with a decent SSD.
>
>My table schema includes something like `… name text, timestamp integer
>…`.
>The table has about 500,000 rows, with 1700 distinct `name` values, and
>occupies about 500MB on disk.
>I'm querying `SELECT name, max(timestamp) FROM table GROUP BY name`.
>I've created an index on `name, timestamp desc`.
>
>EXPLAIN QUERY PLAN says:
>       7|0|0| SCAN TABLE table USING INDEX namesAndTimestamps
>Note the dreaded `SCAN`. :-(
>
>I think I understand what's going on here: even if SQLite can tell it
>only needs to use the first index entry for any given name (because that
>will have the largest timestamp), it still has to scan through the index
>to find where the next name starts. (Or is there some optimization to
>jump rapidly through the B-tree to do this?)
>
>Is there anything I can do to the schema, index or query to make this
>more efficient?
>
>One simplifying assumption: I basically never drop any rows, and if I do
>they're old rows, so max(timestamp) for any author will only increase.
>
>Thanks for any advice…
>
>—Jens
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Optimizing `SELECT a, max(b) GROUP BY a`

Reply via email to