On 2018/01/08 12:39 PM, x wrote:
Thanks Cezary and Scott. I’m now a bit clearer as to what’s happening. I 
imagined the RowID as being a separate index which is the root of my confusion.

To elaborate a little - We often get people here asking "But why it table-scans in stead of using my nice Index?".

This stems from an often-held misconception that Indexes are God-sent magic to improve everything. The fact is that Indexes are costly mechanisms which allows fast lookup which, only AFTER a certain critical size and for specific circumstances, become more efficient than a scan. The Query Planner has to do a lot of work to figure out what those "critical size and specific circumstances" is for any specific query, and it does get real fuzzy.

I think I've heard Richard or Dan explain it as follows (if memory serves, someone please point out if I'm mistaken):

You can think of an SQLite table as essentially a btree covering Index by itself with the Key being the Row_ID (or more recently, the PK for WITHOUT ROWID tables). This is why the rowid (or any column serving as an alias to it, or the PK for WITHOUT ROWID tables) cannot have NULL values, but any other primary key could (in SQLite).

Being an Index by itself means that a Table-Scan is perhaps not as inefficient as one might think and indeed using any other index means a round-trip reading and hitting values in THAT index, then returning and looking up the hit result in the rowid table index, and then reading the pages(s) from it and extracting the data - where during a table scan, all this round tripping is skipped.

So unless any prospective candidate Index for any query offers a truly magnificent cost advantage, a table scan would probably be more efficient, and so be chosen.  This is why running ANALYZE on large tables is needed, because it allows the QP to better deduce whether a prospective Index might in fact offer such a magnificent cost reduction or not. Another way is hinting at the QP (Search "likelihood" in the docs).

This is why a non-rowid-alias Primary Key on a rowid table is also less efficient to scan than the table itself (often very non-intuitive) - or - why a covering index sometimes gets avoided in a JOIN when it seems to contain all needed data to fulfill the join obligation.

Also, often a great index is not used simply because the query planner does not know enough about it and its prospective cost to obtain a good estimate of its utility, and sometimes what feels intuitively to us as a great Index just isn't really. The QP is not infallible, but it is quite smart.


sqlite-users mailing list

Reply via email to