Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-18 Thread David Empson

On 16/06/2014, at 11:36 pm, Richard Hipp  wrote:

> On Mon, Jun 16, 2014 at 5:07 AM, David Empson  wrote:
> 
>> It appears SQLite 3.8.1 removed an optimisation where earlier versions of
>> the query planner were checking for two or more "lower bound" comparisons
>> against the key for an index, and combining them so the greater of the two
>> values was used as a lower bound.
>> 
>> 
> There never has been any such optimization in SQLite.  If it picked the
> better lower bound in 3.8.0, then that was purely by luck.

OK thanks, that makes sense.

> I suggest you rewrite your query.  Instead of
> 
> ... WHERE x BETWEEN ?1 AND ?2 AND x>?3
> 
> Consider using
> 
> ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3

I assume that was supposed to be WHERE x BETWEEN max(?1,?3) AND ?2.

I agree, that seems a reasonable solution. Something like that was on 
tomorrow's todo list.

> Also, when running EXPLAIN, please first give the command-line shell the
> ".explain" command in order to set output formatting up to show the program
> listing in a more readable format.

Noted, thanks for the tip.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread Simon Slavin

On 16 Jun 2014, at 1:10pm, Richard Hipp  wrote:
> 
> That's not quite the same thing.  "x BETWEEN ?1 AND ?2" means "x>=?1 AND
> x<=?2".
> 
> Now, if x is the INTEGER PRIMARY KEY (and thus is guaranteed to be an
> integer) you could say:
> 
>WHERE rowid BETWEEN max(?1,?3+1) AND ?2

I stand corrected.  Nice trick.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread Richard Hipp
On Mon, Jun 16, 2014 at 7:46 AM, Simon Slavin  wrote:

>
> > On 16 Jun 2014, at 12:36pm, Richard Hipp  wrote:
> >
> > Instead of
> >
> > ... WHERE x BETWEEN ?1 AND ?2 AND x>?3
> >
> > Consider using
> >
> > ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3
>
> I suspect Dr Hipp means
>
> > ... WHERE x BETWEEN max(?1,?3) AND ?2
>

That's not quite the same thing.  "x BETWEEN ?1 AND ?2" means "x>=?1 AND
x<=?2".

Now, if x is the INTEGER PRIMARY KEY (and thus is guaranteed to be an
integer) you could say:

WHERE rowid BETWEEN max(?1,?3+1) AND ?2



>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread David Empson

On 16/06/2014, at 11:36 pm, Richard Hipp  wrote:

> On Mon, Jun 16, 2014 at 5:07 AM, David Empson  wrote:
> 
>> It appears SQLite 3.8.1 removed an optimisation where earlier versions of
>> the query planner were checking for two or more "lower bound" comparisons
>> against the key for an index, and combining them so the greater of the two
>> values was used as a lower bound.
>> 
>> 
> There never has been any such optimization in SQLite.  If it picked the
> better lower bound in 3.8.0, then that was purely by luck.

OK thanks, that makes sense.

> I suggest you rewrite your query.  Instead of
> 
>... WHERE x BETWEEN ?1 AND ?2 AND x>?3
> 
> Consider using
> 
>... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3

I assume that was supposed to be WHERE x BETWEEN max(?1,?3) AND ?2.

I agree, that seems a reasonable solution. Something like that was on 
tomorrow's todo list.

> Also, when running EXPLAIN, please first give the command-line shell the
> ".explain" command in order to set output formatting up to show the program
> listing in a more readable format.

Noted, thanks for the tip.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread Simon Slavin

> On 16 Jun 2014, at 12:36pm, Richard Hipp  wrote:
> 
> Instead of
> 
> ... WHERE x BETWEEN ?1 AND ?2 AND x>?3
> 
> Consider using
> 
> ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3

I suspect Dr Hipp means

> ... WHERE x BETWEEN max(?1,?3) AND ?2

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread Richard Hipp
On Mon, Jun 16, 2014 at 5:07 AM, David Empson  wrote:

> It appears SQLite 3.8.1 removed an optimisation where earlier versions of
> the query planner were checking for two or more "lower bound" comparisons
> against the key for an index, and combining them so the greater of the two
> values was used as a lower bound.
>
>
There never has been any such optimization in SQLite.  If it picked the
better lower bound in 3.8.0, then that was purely by luck.

I suggest you rewrite your query.  Instead of

 ... WHERE x BETWEEN ?1 AND ?2 AND x>?3

Consider using

 ... WHERE x BETWEEN max(?1,?3) AND ?2 AND x>?3

Also, when running EXPLAIN, please first give the command-line shell the
".explain" command in order to set output formatting up to show the program
listing in a more readable format.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance regression with multiple lower bound tests

2014-06-16 Thread David Empson
I've recently noticed a major drop in performance in one part of our main 
application at work, and have managed to track it down to a change in recent 
versions of SQLite.

We are storing a log in a simple SQLite table, and have a viewing screen which 
allows the user to browse through the log.

CREATE TABLE log (rowid INTEGER PRIMARY KEY NOT NULL, type INTEGER, line TEXT);

The rowid incorporates the timestamp, allowing us to quickly locate particular 
times by calculating the rowid from the desired time.

I noticed the log view was getting progressively slower the further I got into 
the table. By the time it got to row 100K+ it was taking over 100 ms to move up 
or down rows. In older versions the same operation was nearly instant. With 
some instrumentation I confirmed that SQLite was taking a long time to return 
the first row in each query, and that this had started some time between SQLite 
3.7.17 and SQLite 3.8.4.3. I also tried 3.8.5, which was the same as 3.8.4.3.

This is the general form of the query we use (including our own custom 
functions, which are defined as SQLITE_DETERMINISTIC when using SQLite 3.8.3 or 
later):

SELECT rowid, type, time_from_rowid(rowid) || line FROM log WHERE rowid BETWEEN 
calc_rowid(?1) and calc_rowid(?2) AND rowid > ?3 ORDER BY rowid LIMIT ?4;

The parameters are ?1 = minimum rowid for the entire view, ?2 = maximum rowid 
for the entire view, ?3 = rowid immediately preceding the first line to 
display, ?4 = number of rows to fetch at once (typically 100). I confirmed our 
functions were not part of the problem by substituting externally calculated 
values, and reduced the SELECT to only return rowid. The performance was 
unchanged.

I examined the statement using EXPLAIN QUERY PLAN and EXPLAIN, checked it with 
several SQLite versions, and have worked out what happened.

For SQLite 3.8.0.2 and earlier, the integer primary key index is used to start 
at whichever is the greater of calc_rowid(?1) and ?3, then the table is scanned 
returning rows until the row limit or calc_rowid(?2) is reached.

For SQLite 3.8.1 and later, the integer primary key index is used to start at 
calc_rowid(?1), then the table is scanned ignoring all rows until ?3 is reached 
(taking a significant amount of time when there are hundreds of thousands of 
rows), then rows are returned until the row limit or calc_rowid(?2) is reached.

It appears SQLite 3.8.1 removed an optimisation where earlier versions of the 
query planner were checking for two or more "lower bound" comparisons against 
the key for an index, and combining them so the greater of the two values was 
used as a lower bound.

I was able to repeat this with an even simpler table using the command line 
tool, and two greater-than comparisons (so it isn't specific to BETWEEN 
combined with greater-than). I've included the output to EXPLAIN QUERY PLAN and 
EXPLAIN to assist, and have inserted blank lines as a reading aid.

slow.db3 (about 8.5 MB, containing just 1048576 sequential rowids).
https://www.dropbox.com/l/4xTnsxfRithPnHOAZQwtSt?

% sqlite3 slow.db3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.

sqlite> .schema
CREATE TABLE log(rowid integer primary key not null);

sqlite> explain query plan select rowid from log where rowid > 0 and rowid < 
100 order by rowid limit 10;
0|0|0|SEARCH TABLE log USING INTEGER PRIMARY KEY (rowid>? AND rowid explain select rowid from log where rowid > 0 and rowid < 100 order 
by rowid limit 10;
0|Init|0|14|0||00|
1|Noop|0|0|0||00|
2|Integer|10|1|0||00|
3|OpenRead|0|2|0|0|00|
4|SeekGT|0|12|2||00|
5|Integer|100|3|0||00|
6|Rowid|0|4|0||00|
7|Ge|3|12|4||6b|
8|Copy|4|5|0||00|
9|ResultRow|5|1|0||00|
10|IfZero|1|12|-1||00|
11|Next|0|6|0||00|
12|Close|0|0|0||00|
13|Halt|0|0|0||00|
14|Transaction|0|0|1|0|01|
15|TableLock|0|2|0|log|00|
16|Integer|0|2|0||00|
17|Goto|0|1|0||00|

sqlite> explain query plan select rowid from log where rowid > 0 and rowid > 
80 and rowid < 100 order by rowid limit 10;
0|0|0|SEARCH TABLE log USING INTEGER PRIMARY KEY (rowid>? AND rowid explain select rowid from log where rowid > 0 and rowid > 80 and 
rowid < 100 order by rowid limit 10;
0|Init|0|15|0||00|
1|Noop|0|0|0||00|
2|Integer|10|1|0||00|
3|OpenRead|0|2|0|0|00|
4|SeekGT|0|13|2||00|
5|Integer|100|3|0||00|
6|Rowid|0|4|0||00|
7|Ge|3|13|4||6b|
8|Le|6|12|4||6c|
9|Copy|4|7|0||00|
10|ResultRow|7|1|0||00|
11|IfZero|1|13|-1||00|
12|Next|0|6|0||00|
13|Close|0|0|0||00|
14|Halt|0|0|0||00|
15|Transaction|0|0|1|0|01|
16|TableLock|0|2|0|log|00|
17|Integer|0|2|0||00|
18|Integer|80|6|0||00|
19|Goto|0|1|0||00|

sqlite> .timer on

sqlite> select rowid from log where rowid > 80 and rowid < 100 order by 
rowid limit 10;
81
82
83
84
85
86
87
88
89
800010
Run Time: real 0.004 user 0.00 sys 0.00

sqlite> select rowid from log where rowid > 0 and rowid > 80 and rowid < 
100 order by rowid limit 10;
81
82
83