Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Nico Williams Wed, 06 Sep 2017 15:12:27 -0700

On Wed, Sep 06, 2017 at 11:54:35PM +0200, R Smith wrote:
> It's still remarkable that in both tests 5 and 6 I've used the very same PK
> setup, yet Test 6 was significantly faster with the added ORDER BY clause.
> In tests 1 through 4 I did not use a PK at all, just plain INT data field,
> but then I did not test the ORDER BY in those tests.


It's an optimizer issue.  It didn't occur to the optimizer that scanning
a covering index was better than scanning the table because the covering
index has (in this case, and always, for rowid tables anyways) strictly
less contents to read and decode.

Scanning the covering index has the happy side-effect (if you wanted if) of
producing ordered results and making an equivalent ORDER BY free.

(Whereas scanning the table will produce results in rowid order, which is
almost certainly not useful unless you explicitly wanted an INTEGER
PRIMARY KEY.)

Note that INTERSECT could have used a hash table, thus producing
unordered results (most likely).  But SQLite3 only knows b-trees.

All of this explains the accidental ordering / non-ordering.  And also
why you shouldn't count on it: it's all implementaton details!

But just because you know to add an ORDER BY doesn't mean you shouldn't
think to make it match some suitable index...  The optimizer is nice,
but you still have to think a little bit like an optimizer yourself :(

> It might turn out to be a wild goose chase, but that will be easily evident
> when testing without the PK and with more realistic real-world data. I'll do
> that this weekend.

You can't have a PK-less table -- SQLite3 always want some PK, even if
it's a hidden rowid column.  WITHOUT ROWID tables make this clearer:

  sqlite> create table t(a text) without rowid;
  Error: PRIMARY KEY missing on table t

Adding an explicit PK implicitly added the covering index that sped up
the JOIN (once you forced the optimizer to use it).  But you should just
always have had an explicit PK and WITHOUT ROWID.

You still found something interesting about using JOINs to filter result
sets (as opposed to adding columns to the the result set).  Something to
keep in mind...  I do a lot of queries where some of the JOINed tables
are used only for filtering.  It's not always possible to convert such
queries to INTERSECT, but it might be possible for SQLite3 to learn how
to perform the equivalent optimization internally, and when to do it.

Nico
-- 
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

Reply via email to