The www.sqlite.org server logs are stored in an SQLite database (of
course).  We have a script that is run daily on that database with a dozen
or so queries that look something like this:

.print
.print **** Downloads in the past 24 hours ****
.mode column
.width -6 -6 100
.header on
SELECT count(*) AS count, count(distinct ip) AS IPs, url
  FROM log
 WHERE date>datetime('now','-24 hours')
    AND (url GLOB '*/2013/*'
         OR url GLOB '*/snapshot/*'
         OR url GLOB '*/xfer'
         OR url GLOB '*/blob/*')
 GROUP BY url
 ORDER BY count DESC;

Today's database contains 6 days worth of data, is 590MB in size, and is
just short of 2 million records.  There are no indices, so each query is a
full table scan.  Using SQLite 3.8.1, the query above took 1.793 seconds on
a recent test run.  But the code on trunk (and the latest snapshots at
http://www.sqlite.org/download.html) took only 0.686 seconds.  The
difference is that the string constants and the datetime() function call
are factored out of the inner loop in 3.8.2.

To be fair:  Our production script does not contain exactly the SQL shown
above. Rather than using the bare datetime() call, the real script says:
"(SELECT datetime('now','-24 hours'))".  Putting the datetime() call inside
a subquery is a trick that causes the datetime function to only be invoked
once even without the new constant-function factoring optimization of
3.8.2.  And with that trick, the performance difference is not nearly so
dramatic (though 3.8.2 is still faster by about 10%).  Also, the
performance difference here would only apply to a full table scan that
spends a lot of time looping inside of the virtual machine.  There would
not be nearly as dramatic a difference if the database held an index on the
"log.date" column.

Nevertheless, we suspect that queries like the above are common in the
wild, and so we hope that the upcoming 3.8.2 release will make a big
difference for some people.

If you try the 3.8.2 pre-release snapshot in your application and see a
performance improvement, we'd appreciate hearing from you.

For testing and comparison purposes, the optimization that doubles the
performance of the query above can be disabled using the following API:

    sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 8);

where "db" is the database connection create by sqlite3_open(), or if you
are using the command-line shell:

    .testctrl optimizations 8

-- 
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to