Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Richard Hipp
On Fri, Oct 14, 2011 at 12:46 PM, Fabian  wrote:

> 2011/10/14 Simon Slavin 
> >
> > So, OP, try issuing these commands
> >
> > DROP TABLE sqlite_stat1;
> > DROP TABLE sqlite_stat2;
> >
> > then see whether anything improves.
> >
> >
> I already tried that, but it returned:
>
> Error: table sqlite_stat1 may not be dropped
>

If you build from the Fossil trunk, DROP TABLE sqite_statN will work.  Or,
you can do this:

DELETE FROM sqlite_stat1;
DELETE FROM sqlite_stat2;
ANALYZE sqlite_master;

The extra ANALYZE is necessary to cause SQLite to reload the tables.  Or,
instead of doing the extra ANALYZE, you can close and reopen the database
connection.



> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Dan Kennedy

On 10/14/2011 11:23 PM, Simon Slavin wrote:


On 14 Oct 2011, at 5:12pm, Fabian wrote:


Is this the expected output? If so, ANALYZE was to blame. The
query containing 'ORDER BY rowid DESC' is still slower than the one
which doesn't specify any order, but the results are closer to
eachother now.


Iterating backwards is often a bit slower than iterating forwards.
Especially with a database that not very fragmented.


So that should never happen, right ?  ANALYZE is meant to make things
faster, not slower.  So is that an actual fixable bug or is it one of
those extremely unlikely situations that is hard to fix ?


I'd agree it's a subtle bug. Since "SELECT WHERE data=10 ORDER BY rowid"
is in practice the same query as "WHERE data=10", it's difficult to
think of a reason why ANALYZE data should cause SQLite to change the
query plan for one of them and not the other.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Simon Slavin 

>
> So that should never happen, right ?  ANALYZE is meant to make things
> faster, not slower.  So is that an actual fixable bug or is it one of those
> extremely unlikely situations that is hard to fix ?
>
> So, OP, try issuing these commands
>
> DROP TABLE sqlite_stat1;
> DROP TABLE sqlite_stat2;
>
> then see whether anything improves.
>
>
I already tried that, but it returned:

Error: table sqlite_stat1 may not be dropped
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Simon Slavin

On 14 Oct 2011, at 5:12pm, Fabian wrote:

> Is this the expected output? If so, ANALYZE was to blame. The query
> containing 'ORDER BY rowid DESC' is still slower than the one which doesn't
> specify any order, but the results are closer to eachother now.

So that should never happen, right ?  ANALYZE is meant to make things faster, 
not slower.  So is that an actual fixable bug or is it one of those extremely 
unlikely situations that is hard to fix ?

So, OP, try issuing these commands

DROP TABLE sqlite_stat1;
DROP TABLE sqlite_stat2;

then see whether anything improves.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
>
> Do you have an sqlite_stat1 table in the database (created by
> running ANALYZE)? What is the output of the shell command
> How about the contents of the "sqlite_stat1" table? What does
> the shell command ".dump sqlite_stat1" show?
>
>
This is the output with a fresh database, where ANALYZE hasn't been run:

EXPLAIN SELECT * FROM table1 WHERE data=1 LIMIT 250 OFFSET 5;

0|Trace|0|0|0||00|
1|Integer|250|1|0||00|
2|Integer|5|2|0||00|
3|MustBeInt|2|0|0||00|
4|IfPos|2|6|0||00|
5|Integer|0|2|0||00|
6|Add|1|2|3||00|
7|IfPos|1|9|0||00|
8|Integer|-1|3|0||00|
9|Integer|1|4|0||00|
10|Goto|0|39|0||00|
11|OpenRead|0|2|0|13|00|
12|OpenRead|1|31079|0|keyinfo(1,BINARY)|00|
13|SeekGe|1|36|4|1|00|
14|IdxGE|1|36|4|1|01|
15|IdxRowid|1|5|0||00|
16|Seek|0|5|0||00|
17|AddImm|2|-1|0||00|
18|IfNeg|2|20|0||00|
19|Goto|0|35|0||00|
20|IdxRowid|1|6|0||00|
21|Column|1|0|7||00|
22|Column|0|2|8||00|
23|Column|0|3|9||00|
24|Column|0|4|10||00|
25|Column|0|5|11||00|
26|Column|0|6|12||00|
27|Column|0|7|13||00|
28|Column|0|8|14||00|
29|Column|0|9|15||00|
30|Column|0|10|16||00|
31|Column|0|11|17||00|
32|Column|0|12|18||00|
33|ResultRow|6|13|0||00|
34|IfZero|1|36|-1||00|
35|Next|1|14|0||00|
36|Close|0|0|0||00|
37|Close|1|0|0||00|
38|Halt|0|0|0||00|
39|Transaction|0|0|0||00|
40|VerifyCookie|0|10|0||00|
41|TableLock|0|2|0|table1|00|
42|Goto|0|11|0||00|

SELECT * FROM table1 WHERE data=1 ORDER BY rowid DESC LIMIT 250 OFFSET
5;

0|Trace|0|0|0||00|
1|Noop|0|0|0||00|
2|Integer|250|1|0||00|
3|Integer|5|2|0||00|
4|MustBeInt|2|0|0||00|
5|IfPos|2|7|0||00|
6|Integer|0|2|0||00|
7|Add|1|2|3||00|
8|IfPos|1|10|0||00|
9|Integer|-1|3|0||00|
10|Integer|1|4|0||00|
11|Goto|0|40|0||00|
12|OpenRead|0|2|0|13|00|
13|OpenRead|2|31079|0|keyinfo(1,BINARY)|00|
14|SeekLe|2|37|4|1|00|
15|IdxLT|2|37|4|1|00|
16|IdxRowid|2|5|0||00|
17|Seek|0|5|0||00|
18|AddImm|2|-1|0||00|
19|IfNeg|2|21|0||00|
20|Goto|0|36|0||00|
21|IdxRowid|2|6|0||00|
22|Column|2|0|7||00|
23|Column|0|2|8||00|
24|Column|0|3|9||00|
25|Column|0|4|10||00|
26|Column|0|5|11||00|
27|Column|0|6|12||00|
28|Column|0|7|13||00|
29|Column|0|8|14||00|
30|Column|0|9|15||00|
31|Column|0|10|16||00|
32|Column|0|11|17||00|
33|Column|0|12|18||00|
34|ResultRow|6|13|0||00|
35|IfZero|1|37|-1||00|
36|Prev|2|15|0||00|
37|Close|0|0|0||00|
38|Close|2|0|0||00|
39|Halt|0|0|0||00|
40|Transaction|0|0|0||00|
41|VerifyCookie|0|10|0||00|
42|TableLock|0|2|0|table1|00|
43|Goto|0|12|0||00|

Is this the expected output? If so, ANALYZE was to blame. The query
containing 'ORDER BY rowid DESC' is still slower than the one which doesn't
specify any order, but the results are closer to eachother now.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> Your EXPLAIN output shows that it is doing a linear scan of
> table1. Which is different from what I get here with the same
> schema and query. When I run them here, both queries (with and
> without the "ORDER BY rowid") use the same query plan.
>
> Do you have an sqlite_stat1 table in the database (created by
> running ANALYZE)? What is the output of the shell command
> How about the contents of the "sqlite_stat1" table? What does
> the shell command ".dump sqlite_stat1" show?


You are right, I issues an ANALYSE a couple of days ago, I completely forgot
about that. Maybe that's causing a negative impact on the performance.

sqlite> .dump sqlite_stat1
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
ANALYZE sqlite_master;
INSERT INTO "sqlite_stat1" VALUES('table1','ididx','463923 66275');
COMMIT;
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Dan Kennedy


On 10/14/2011 10:13 PM, Fabian wrote:

2011/10/14 Dan Kennedy



Good question. Can you enter the following commands into
the shell tool and post the complete output (no "QUERY PLAN"
this time):

  .version
  .schema
  .explain

  EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
50;

  EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;



sqlite>  .version
SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177
sqlite>  .schema
CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT);
CREATE INDEX ididx ON table1(data);
sqlite>  .explain
sqlite>

EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
50;


Your EXPLAIN output shows that it is doing a linear scan of
table1. Which is different from what I get here with the same
schema and query. When I run them here, both queries (with and
without the "ORDER BY rowid") use the same query plan.

Do you have an sqlite_stat1 table in the database (created by
running ANALYZE)? What is the output of the shell command
How about the contents of the "sqlite_stat1" table? What does
the shell command ".dump sqlite_stat1" show?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> Good question. Can you enter the following commands into
> the shell tool and post the complete output (no "QUERY PLAN"
> this time):
>
>  .version
>  .schema
>  .explain
>
>  EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
> 50;
>
>  EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;
>
>
sqlite> .version
SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177
sqlite> .schema
CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT);
CREATE INDEX ididx ON table1(data);
sqlite> .explain
sqlite>

EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
50;

addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Trace  0 0 000
1 Noop   0 0 000
2 Integer250   1 000
3 Integer5  2 000
4 MustBeInt  2 0 000
5 IfPos  2 7 000
6 Integer0 2 000
7 Add1 2 300
8 IfPos  1 10000
9 Integer-13 000
10Integer1 4 000
11Goto   0 36000
12OpenRead   0 2 0 12 00
13Rewind 0 34000
14Column 0 1 500
15Ne 4 335 collseq(BINARY)  6c
16AddImm 2 -1000
17IfNeg  2 19000
18Goto   0 33000
19Rowid  0 7 000
20Column 0 1 800
21Column 0 2 900
22Column 0 3 10   00
23Column 0 4 11   00
24Column 0 5 12   00
25Column 0 6 13   00
26Column 0 7 14   00
27Column 0 8 15   00
28Column 0 9 16   00
29Column 0 1017   00
30Column 0 1118   00
31ResultRow  7 12000
32IfZero 1 34-1   00
33Next   0 14001
34Close  0 0 000
35Halt   0 0 000
36Transaction0 0 000
37VerifyCookie   0 10000
38TableLock  0 2 0 table1  00
39Goto   0 12000

EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;

addr  opcode p1p2p3p4 p5  comment
  -        -  --  ---
0 Trace  0 0 000
1 Integer250   1 000
2 Integer5  2 000
3 MustBeInt  2 0 000
4 IfPos  2 6 000
5 Integer0 2 000
6 Add1 2 300
7 IfPos  1 9 000
8 Integer-13 000
9 Integer1 4 000
10Goto   0 38000
11OpenRead   0 2 0 12 00
12OpenRead   1 123405  0 keyinfo(1,BINARY)  00
13SeekGe 1 354 1  00
14IdxGE  1 354 1  01
15IdxRowid   1 5 000
16Seek   0 5 000
17AddImm 2 -1000
18IfNeg  2 20000
19Goto   0 34000
20IdxRowid   1 6 000
21Column 1 0 700
22Column 0 2 800
23Column 0 3 900
24Column 0 4 10   00
25Column 0 5 11   00
26Column 0 6 12   00
27Column 0 7 13   00
28Column 0 8 14   00
29Column 

Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Dan Kennedy

On 10/14/2011 09:34 PM, Fabian wrote:

2011/10/14 Dan Kennedy



If SQLite cannot use an index to for an ORDER BY in a SELECT
query and has to do an external sort, the EXPLAIN QUERY PLAN
output will have something like this in it:

  0|0|0|USE TEMP B-TREE FOR ORDER BY



Since my EXPLAIN does not show this, what else could be the cause for the
large decrease in performance when ordering by rowid?


Good question. Can you enter the following commands into
the shell tool and post the complete output (no "QUERY PLAN"
this time):

  .version
  .schema
  .explain

  EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 
OFFSET 50;


  EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Simon Slavin

On 14 Oct 2011, at 2:59pm, Dan Kennedy wrote:

> On 10/14/2011 07:40 PM, Simon Slavin wrote:
>> 
>> I'm sorry, I completely missed the 'data = 10' earlier.  If you have an 
>> index on the data column then that's the index SQLite would used for that 
>> query.  Once it has picked that index it no longer has access to the rowid 
>> index.
>> 
>> If you want an index that would be ideal for
>> 
>> SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50
>> 
>> it would be one index on the two columns together:
>> 
>> CREATE INDEX tdr ON table (data, rowid)
> 
> This is a logical conclusion to draw, but it is not actually
> necessary. All indexes created by SQLite implicitly have the
> rowid as the final column. So all you really need for the
> query above is:
> 
>  CREATE INDEX tdr ON table1(data);

I understand your reasoning, but what the OP reports suggests that SQLite is 
not working this way.

> Note that this:
> 
>  CREATE INDEX tdr ON table1(data, someothercolumn);
> 
> would not work quite as well. Since the index would not help
> with the ORDER BY.

The precise index I recommended was specially picked for the precise SELECT I 
mentioned.  The 'othercolumn' is, in fact, the column that the OP wanted in his 
'ORDER BY'.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> If SQLite cannot use an index to for an ORDER BY in a SELECT
> query and has to do an external sort, the EXPLAIN QUERY PLAN
> output will have something like this in it:
>
>  0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
Since my EXPLAIN does not show this, what else could be the cause for the
large decrease in performance when ordering by rowid?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Dan Kennedy

On 10/14/2011 07:40 PM, Simon Slavin wrote:


On 14 Oct 2011, at 1:36pm, David Bicking wrote:


On 10/14/2011 06:39 AM, Fabian wrote:

When I execute:

SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50

It's very fast, but it's get much slower (10 times) when I add an ORDER BY
clause, like rowid ASC or rowid DESC.

I'm trying to understand why this is. It seems like SQLite is actually
performing an actual sort behind the scenes, while I expected it to just
iterate in reverse order (because rowid is always incremental), which should
give comparable performance as the first query?


What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one index per 
table.


I'm sorry, I completely missed the 'data = 10' earlier.  If you have an index 
on the data column then that's the index SQLite would used for that query.  
Once it has picked that index it no longer has access to the rowid index.

If you want an index that would be ideal for

SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50

it would be one index on the two columns together:

CREATE INDEX tdr ON table (data, rowid)


This is a logical conclusion to draw, but it is not actually
necessary. All indexes created by SQLite implicitly have the
rowid as the final column. So all you really need for the
query above is:

  CREATE INDEX tdr ON table1(data);

Note that this:

  CREATE INDEX tdr ON table1(data, someothercolumn);

would not work quite as well. Since the index would not help
with the ORDER BY.

If SQLite cannot use an index to for an ORDER BY in a SELECT
query and has to do an external sort, the EXPLAIN QUERY PLAN
output will have something like this in it:

  0|0|0|USE TEMP B-TREE FOR ORDER BY

Dan.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Petite Abeille

On Oct 14, 2011, at 3:37 PM, Fabian wrote:

> 2011/10/14 Petite Abeille 
> 
>> 
>> Hurray! Now you must have the finest query ever to grace the intraweb! A
>> true work of beauty :))
>> 
>> 
> Here in Belgium we have a saying: Who doesnt honor the petite, is not worth
> the big ;)

"The details are not the details, the details make the product."
-- Charles Eames

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Petite Abeille 

>
> Hurray! Now you must have the finest query ever to grace the intraweb! A
> true work of beauty :))
>
>
Here in Belgium we have a saying: Who doesnt honor the petite, is not worth
the big ;)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Petite Abeille

On Oct 14, 2011, at 2:49 PM, Fabian wrote:

> That explains everything! 

Hurray! Now you must have the finest query ever to grace the intraweb! A true 
work of beauty :))

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Simon Slavin

On 14 Oct 2011, at 1:49pm, Fabian wrote:

> 2011/10/14 Simon Slavin 
> 
>> If you have an index on the data column then that's the index SQLite would
>> used for that query.  Once it has picked that index it no longer has access
>> to the rowid index.
>> 
>> CREATE INDEX tdr ON table (data, rowid)
> 
> Thanks! That explains everything! Can I remove the index for 'data' if I
> create an index like that, or do I need to keep them both?

You can remove the index on the single column.  SQLite is smart enough to use 
the compound index even if it needs only the columns at the beginning of it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Simon Slavin 

>
> If you have an index on the data column then that's the index SQLite would
> used for that query.  Once it has picked that index it no longer has access
> to the rowid index.
>
> CREATE INDEX tdr ON table (data, rowid)
>
>
Thanks! That explains everything! Can I remove the index for 'data' if I
create an index like that, or do I need to keep them both?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Petite Abeille 

>
> Much? Really? I get the broadly same execution time for either variant:
>
> 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows)
>
> 0|0|0|SCAN TABLE mail_header (~2192503 rows)
>
>
I get

SELECT mail_header.rowid
FROM mail_header
WHERE data = 1
ORDER BY mail_header.rowid DESC  LIMIT 250 OFFSET 142750

0 0 0 SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~46392 rows)

VS

SELECT mail_header.rowid
FROM mail_header
WHERE data = 1
LIMIT 250 OFFSET 142750

0 0 0 SEARCH TABLE mail_header USING INDEX ididx (id=?) (~66275 rows)


So you seem to get the same speed in both queries, because you have no index
on the column in the WHERE clause, making my EXPLAIN differs from yours? But
when the column is indexed, the 'ORDER BY' makes the query much slower.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Simon Slavin

On 14 Oct 2011, at 1:36pm, David Bicking wrote:

> On 10/14/2011 06:39 AM, Fabian wrote:
>> When I execute:
>> 
>> SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50
>> 
>> It's very fast, but it's get much slower (10 times) when I add an ORDER BY
>> clause, like rowid ASC or rowid DESC.
>> 
>> I'm trying to understand why this is. It seems like SQLite is actually
>> performing an actual sort behind the scenes, while I expected it to just
>> iterate in reverse order (because rowid is always incremental), which should
>> give comparable performance as the first query?
> 
> What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one index per 
> table.

I'm sorry, I completely missed the 'data = 10' earlier.  If you have an index 
on the data column then that's the index SQLite would used for that query.  
Once it has picked that index it no longer has access to the rowid index.

If you want an index that would be ideal for

SELECT * FROM table WHERE data = 10 ORDER BY rowid LIMIT 250 OFFSET 50

it would be one index on the two columns together:

CREATE INDEX tdr ON table (data, rowid)

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread David Bicking

On 10/14/2011 06:39 AM, Fabian wrote:



Exactly.



I still don't have optimal performance in the query (although it's much
better now), and it seems to be related to ORDER BY.

When I execute:

SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50

It's very fast, but it's get much slower (10 times) when I add an ORDER BY
clause, like rowid ASC or rowid DESC.

I'm trying to understand why this is. It seems like SQLite is actually
performing an actual sort behind the scenes, while I expected it to just
iterate in reverse order (because rowid is always incremental), which should
give comparable performance as the first query?


What does EXPLAIN QUERY PLAN say? I believe Sqlite can only use one 
index per table. If you have an index on the data field, it is probably 
using that index to quickly get the data=10 condition, in which case it 
can't use the rowid index, and thus has to sort behind the scenes as you 
say.


If it is using an index to find the data, I believe you can do +data = 
10, which will invalidate the index use on data. (Hope I remember that 
right..)


David
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Petite Abeille

On Oct 14, 2011, at 12:39 PM, Fabian wrote:

> I still don't have optimal performance in the query (although it's much
> better now), and it seems to be related to ORDER BY.

Yes, order by has a cost.

> When I execute:
> 
> SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50
> 
> It's very fast, but it's get much slower (10 times) when I add an ORDER BY
> clause, like rowid ASC or rowid DESC.

Much? Really? I get the broadly same execution time for either variant:


explain query plan
selectmail_header.id
from  mail_header

order by  mail_header.id

limit 250
offset5;

0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows)

CPU Time: user 0.006068 sys 0.000665


explain query plan
selectmail_header.id
from  mail_header

limit 250
offset5;

0|0|0|SCAN TABLE mail_header (~2192503 rows)
CPU Time: user 0.005792 sys 0.000655

Note that the query with the order by will use the internal pk index, while the 
one without order will simply scan the table itself.

> I'm trying to understand why this is. It seems like SQLite is actually
> performing an actual sort behind the scenes,

Yes.

> while I expected it to just
> iterate in reverse order (because rowid is always incremental), which should
> give comparable performance as the first query?

No order by = random order.  Try PRAGMA reverse_unordered_selects = boolean;.

For example, while rowid tend to increment monotonically, they can be reused, 
e.g. after a delete.

You might want to read on rowid and autoincrement:

http://www.sqlite.org/autoinc.html



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Simon Slavin

On 14 Oct 2011, at 11:39am, Fabian wrote:

> I still don't have optimal performance in the query (although it's much
> better now), and it seems to be related to ORDER BY.
> 
> When I execute:
> 
> SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50
> 
> It's very fast, but it's get much slower (10 times)

10 times slower mean nothing if the original query is extremely fast.  Do you 
have times in milliseconds or something ?

> when I add an ORDER BY
> clause, like rowid ASC or rowid DESC.

Just for laughs try making your own index on whatever value you think it's 
using for rowid.  In other words, if you have a column in that table declared as

partNumber INTEGER PRIMARY KEY

declare an index on that column, even though theoretically SQLite shouldn't 
need it.

> I'm trying to understand why this is. It seems like SQLite is actually
> performing an actual sort behind the scenes, while I expected it to just
> iterate in reverse order (because rowid is always incremental), which should
> give comparable performance as the first query?

I don't know enough about how SQLite works to guess that, but I do think 
there's something funny about what you report.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
>
>
> Exactly.
>
>
I still don't have optimal performance in the query (although it's much
better now), and it seems to be related to ORDER BY.

When I execute:

SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50

It's very fast, but it's get much slower (10 times) when I add an ORDER BY
clause, like rowid ASC or rowid DESC.

I'm trying to understand why this is. It seems like SQLite is actually
performing an actual sort behind the scenes, while I expected it to just
iterate in reverse order (because rowid is always incremental), which should
give comparable performance as the first query?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Petite Abeille

On Oct 13, 2011, at 10:57 PM, Fabian wrote:

> Thank you very much! This approach solved the problem. However, in my
> situation I need to select a lot more columns than just 'id' from
> 'mail_header',

Feel free to select all the relevant columns from the inner query.

> and when I look at the resulting query it appears it is
> selecting all those columns twice.

Well, if you are explicit, you are specifying them twice, once in the inner 
query, and once at the top level. If that bothers you, you could use '*' in the 
inner part. Even though this is not really advisable. Better to be explicit, at 
the cost of a bit  more verbosity.

> Would it be smart to change the query so that the inner loop only selects
> mail_header.id, and adding a third join that fetches the extra columns from
> 'mail_header'?

No, that would defeat the purpose of the exercise, which is to minimize IO.

> Or would the performance penalty from adding a third join
> out-weight the advantage of selecting less columns in the inner loop?

Exactly.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-13 Thread Fabian
2011/10/12 Petite Abeille 

>
> Now the join is performed only 250 times, adding just a small overhead
> compare the the bare bone query without the join.
>
> The short of it: minimize the amount of work upfront :)
>
>
Thank you very much! This approach solved the problem. However, in my
situation I need to select a lot more columns than just 'id' from
'mail_header', and when I look at the resulting query it appears it is
selecting all those columns twice.

Would it be smart to change the query so that the inner loop only selects
mail_header.id, and adding a third join that fetches the extra columns from
'mail_header'? Or would the performance penalty from adding a third join
out-weight the advantage of selecting less columns in the inner loop?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-12 Thread Petite Abeille

On Oct 12, 2011, at 5:16 PM, Fabian wrote:

> Why is this very fast (20 ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> And this very slow (3500ms):
> 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
> 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)

The issue here is that offset works in term of your entire query. So for each 
row in table1 matching your where close, it's first going to do a join to 
table2, order the entire result set, skip the first half-a-million rows in the 
result set and then return the remaining 250 rows. A rather expensive 
proposition.

Here is an example using two table: mail_header [1] and mail_header_text, a FTS 
table [2]. It's a one-to-one relationship.

(0) Querying the count

selectcount( * )
from  mail_header

where mail_header.header_id = 2

order by  mail_header.id

0|0|0|SCAN TABLE mail_header (~219250 rows)

CPU Time: user 0.690721 sys 0.064676

Ok, 83,391 rows at play.


(1) Querying mail_header, with an offset

explain query plan
selectmail_header.id
from  mail_header

where mail_header.header_id = 2

order by  mail_header.id

limit 250
offset5;

0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)

CPU Time: user 0.390615 sys 0.037031

Ok, we get 250 rows, after sorting 83,391 rows and skipping 50,000 of them.


(2) Same, but with join to mail_header_text 

explain query plan
selectmail_header.id,
  mail_header_text.value
from  mail_header

join  mail_header_text
onmail_header_text.docid = mail_header.id

where mail_header.header_id = 2

order by  mail_header.id

limit 250
offset5;

0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)

CPU Time: user 2.153607 sys 0.265462

Note how it's an order of magnitude slower. This is because all these 
one-to-one joins on these 83K mail_header. They do have a cost.


(3) Same, with a join, but with the offset factored out 

explain query plan
selectmail_header.id,
  mail_header_text.value
from  (
selectmail_header.id
from  mail_header

where mail_header.header_id = 2

order by  mail_header.id

limit 250
offset5
  )
asmail_header

join  mail_header_text
onmail_header_text.docid = mail_header.id;

1|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows)
0|0|0|SCAN SUBQUERY 1 AS mail_header (~250 rows)
0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows)

CPU Time: user 0.402250 sys 0.039327

Now the join is performed only 250 times, adding just a small overhead compare 
the the bare bone query without the join.

The short of it: minimize the amount of work upfront :)

[1] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L271
[2] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L260
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users