Re: [sqlite] Performance regression with multiple lower bound tests
On 16/06/2014, at 11:36 pm, Richard Hippwrote: > On Mon, Jun 16, 2014 at 5:07 AM, David Empson wrote: > >> It appears SQLite 3.8.1 removed an optimisation where earlier versions of >> the query planner were checking for two or more "lower bound" comparisons >> against the key for an index, and combining them so the greater of the two >> values was used as a lower bound. >> >> > There never has been any such optimization in SQLite. If it picked the > better lower bound in 3.8.0, then that was purely by luck. OK thanks, that makes sense. > I suggest you rewrite your query. Instead of > > ... WHERE x BETWEEN ?1 AND ?2 AND x>?3 > > Consider using > > ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3 I assume that was supposed to be WHERE x BETWEEN max(?1,?3) AND ?2. I agree, that seems a reasonable solution. Something like that was on tomorrow's todo list. > Also, when running EXPLAIN, please first give the command-line shell the > ".explain" command in order to set output formatting up to show the program > listing in a more readable format. Noted, thanks for the tip. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance regression with multiple lower bound tests
On 16 Jun 2014, at 1:10pm, Richard Hippwrote: > > That's not quite the same thing. "x BETWEEN ?1 AND ?2" means "x>=?1 AND > x<=?2". > > Now, if x is the INTEGER PRIMARY KEY (and thus is guaranteed to be an > integer) you could say: > >WHERE rowid BETWEEN max(?1,?3+1) AND ?2 I stand corrected. Nice trick. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance regression with multiple lower bound tests
On Mon, Jun 16, 2014 at 7:46 AM, Simon Slavinwrote: > > > On 16 Jun 2014, at 12:36pm, Richard Hipp wrote: > > > > Instead of > > > > ... WHERE x BETWEEN ?1 AND ?2 AND x>?3 > > > > Consider using > > > > ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3 > > I suspect Dr Hipp means > > > ... WHERE x BETWEEN max(?1,?3) AND ?2 > That's not quite the same thing. "x BETWEEN ?1 AND ?2" means "x>=?1 AND x<=?2". Now, if x is the INTEGER PRIMARY KEY (and thus is guaranteed to be an integer) you could say: WHERE rowid BETWEEN max(?1,?3+1) AND ?2 > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance regression with multiple lower bound tests
On 16/06/2014, at 11:36 pm, Richard Hippwrote: > On Mon, Jun 16, 2014 at 5:07 AM, David Empson wrote: > >> It appears SQLite 3.8.1 removed an optimisation where earlier versions of >> the query planner were checking for two or more "lower bound" comparisons >> against the key for an index, and combining them so the greater of the two >> values was used as a lower bound. >> >> > There never has been any such optimization in SQLite. If it picked the > better lower bound in 3.8.0, then that was purely by luck. OK thanks, that makes sense. > I suggest you rewrite your query. Instead of > >... WHERE x BETWEEN ?1 AND ?2 AND x>?3 > > Consider using > >... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3 I assume that was supposed to be WHERE x BETWEEN max(?1,?3) AND ?2. I agree, that seems a reasonable solution. Something like that was on tomorrow's todo list. > Also, when running EXPLAIN, please first give the command-line shell the > ".explain" command in order to set output formatting up to show the program > listing in a more readable format. Noted, thanks for the tip. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance regression with multiple lower bound tests
> On 16 Jun 2014, at 12:36pm, Richard Hippwrote: > > Instead of > > ... WHERE x BETWEEN ?1 AND ?2 AND x>?3 > > Consider using > > ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3 I suspect Dr Hipp means > ... WHERE x BETWEEN max(?1,?3) AND ?2 Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance regression with multiple lower bound tests
On Mon, Jun 16, 2014 at 5:07 AM, David Empsonwrote: > It appears SQLite 3.8.1 removed an optimisation where earlier versions of > the query planner were checking for two or more "lower bound" comparisons > against the key for an index, and combining them so the greater of the two > values was used as a lower bound. > > There never has been any such optimization in SQLite. If it picked the better lower bound in 3.8.0, then that was purely by luck. I suggest you rewrite your query. Instead of ... WHERE x BETWEEN ?1 AND ?2 AND x>?3 Consider using ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3 Also, when running EXPLAIN, please first give the command-line shell the ".explain" command in order to set output formatting up to show the program listing in a more readable format. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance regression with multiple lower bound tests
I've recently noticed a major drop in performance in one part of our main application at work, and have managed to track it down to a change in recent versions of SQLite. We are storing a log in a simple SQLite table, and have a viewing screen which allows the user to browse through the log. CREATE TABLE log (rowid INTEGER PRIMARY KEY NOT NULL, type INTEGER, line TEXT); The rowid incorporates the timestamp, allowing us to quickly locate particular times by calculating the rowid from the desired time. I noticed the log view was getting progressively slower the further I got into the table. By the time it got to row 100K+ it was taking over 100 ms to move up or down rows. In older versions the same operation was nearly instant. With some instrumentation I confirmed that SQLite was taking a long time to return the first row in each query, and that this had started some time between SQLite 3.7.17 and SQLite 3.8.4.3. I also tried 3.8.5, which was the same as 3.8.4.3. This is the general form of the query we use (including our own custom functions, which are defined as SQLITE_DETERMINISTIC when using SQLite 3.8.3 or later): SELECT rowid, type, time_from_rowid(rowid) || line FROM log WHERE rowid BETWEEN calc_rowid(?1) and calc_rowid(?2) AND rowid > ?3 ORDER BY rowid LIMIT ?4; The parameters are ?1 = minimum rowid for the entire view, ?2 = maximum rowid for the entire view, ?3 = rowid immediately preceding the first line to display, ?4 = number of rows to fetch at once (typically 100). I confirmed our functions were not part of the problem by substituting externally calculated values, and reduced the SELECT to only return rowid. The performance was unchanged. I examined the statement using EXPLAIN QUERY PLAN and EXPLAIN, checked it with several SQLite versions, and have worked out what happened. For SQLite 3.8.0.2 and earlier, the integer primary key index is used to start at whichever is the greater of calc_rowid(?1) and ?3, then the table is scanned returning rows until the row limit or calc_rowid(?2) is reached. For SQLite 3.8.1 and later, the integer primary key index is used to start at calc_rowid(?1), then the table is scanned ignoring all rows until ?3 is reached (taking a significant amount of time when there are hundreds of thousands of rows), then rows are returned until the row limit or calc_rowid(?2) is reached. It appears SQLite 3.8.1 removed an optimisation where earlier versions of the query planner were checking for two or more "lower bound" comparisons against the key for an index, and combining them so the greater of the two values was used as a lower bound. I was able to repeat this with an even simpler table using the command line tool, and two greater-than comparisons (so it isn't specific to BETWEEN combined with greater-than). I've included the output to EXPLAIN QUERY PLAN and EXPLAIN to assist, and have inserted blank lines as a reading aid. slow.db3 (about 8.5 MB, containing just 1048576 sequential rowids). https://www.dropbox.com/l/4xTnsxfRithPnHOAZQwtSt? % sqlite3 slow.db3 SQLite version 3.8.4.3 2014-04-03 16:53:12 Enter ".help" for usage hints. sqlite> .schema CREATE TABLE log(rowid integer primary key not null); sqlite> explain query plan select rowid from log where rowid > 0 and rowid < 100 order by rowid limit 10; 0|0|0|SEARCH TABLE log USING INTEGER PRIMARY KEY (rowid>? AND rowid explain select rowid from log where rowid > 0 and rowid < 100 order by rowid limit 10; 0|Init|0|14|0||00| 1|Noop|0|0|0||00| 2|Integer|10|1|0||00| 3|OpenRead|0|2|0|0|00| 4|SeekGT|0|12|2||00| 5|Integer|100|3|0||00| 6|Rowid|0|4|0||00| 7|Ge|3|12|4||6b| 8|Copy|4|5|0||00| 9|ResultRow|5|1|0||00| 10|IfZero|1|12|-1||00| 11|Next|0|6|0||00| 12|Close|0|0|0||00| 13|Halt|0|0|0||00| 14|Transaction|0|0|1|0|01| 15|TableLock|0|2|0|log|00| 16|Integer|0|2|0||00| 17|Goto|0|1|0||00| sqlite> explain query plan select rowid from log where rowid > 0 and rowid > 80 and rowid < 100 order by rowid limit 10; 0|0|0|SEARCH TABLE log USING INTEGER PRIMARY KEY (rowid>? AND rowid explain select rowid from log where rowid > 0 and rowid > 80 and rowid < 100 order by rowid limit 10; 0|Init|0|15|0||00| 1|Noop|0|0|0||00| 2|Integer|10|1|0||00| 3|OpenRead|0|2|0|0|00| 4|SeekGT|0|13|2||00| 5|Integer|100|3|0||00| 6|Rowid|0|4|0||00| 7|Ge|3|13|4||6b| 8|Le|6|12|4||6c| 9|Copy|4|7|0||00| 10|ResultRow|7|1|0||00| 11|IfZero|1|13|-1||00| 12|Next|0|6|0||00| 13|Close|0|0|0||00| 14|Halt|0|0|0||00| 15|Transaction|0|0|1|0|01| 16|TableLock|0|2|0|log|00| 17|Integer|0|2|0||00| 18|Integer|80|6|0||00| 19|Goto|0|1|0||00| sqlite> .timer on sqlite> select rowid from log where rowid > 80 and rowid < 100 order by rowid limit 10; 81 82 83 84 85 86 87 88 89 800010 Run Time: real 0.004 user 0.00 sys 0.00 sqlite> select rowid from log where rowid > 0 and rowid > 80 and rowid < 100 order by rowid limit 10; 81 82 83