On Thu, Oct 31, 2013 at 2:45 AM, Olaf Schmidt <[email protected]> wrote:
> Am 29.10.2013 13:19, schrieb Dominique Devienne: > >> [...] >> First off, when you use functions like this, you basically rule >> out index use, even if some_column is indexed. That's not good. >> << WHERE col IN list>> OTOH, might use an index. >> > > Might - yes, and *if* an index is used for the In-Checks, then > you're perhaps "wasting it" - or it could be the wrong index > which is chosen by the query-optimizer. > > The better index (in case you use Tmp-Tables) is not the index > on col of the "real table", but the index on the Tmp-Table-Col. > I'm not convinced by this. The "real table" can be quite large, several 100's to 100,000's rows (up to 1+ million rows) and col can be the primary key, or a non-unique "parent" key where many parent keys have about 10 rows each, and a few have in the 1000's, while the in-list could very small (down to just 1 element) or quite large (several thousands). With a function based approach, you are *always* full-scanning the whole "real" table, no matter the cardinality of the InList operand, and even with a very fast InList function, this is not going to beat getting 10 PK rows, or 10 "parent" key rows (e.g. 100 rows to 10,000 rows) via indexes, especially since these are virtual tables with Boost.MultiIndex unique or non-unique indexes (i.e. 5x to 10x faster than SQLite's paged B-tree indexes). It might well beat it if the InList operand cardinality is high, as in your 40K and 60K testing in a 100K rows table, because an InList that's 40% or 60% of the whole table is close enough to a full scan that using a native code set or map test similarly outperforms SQLite's generic paged B-tree indexes like our Boost.MultiIndex-based indexes. Of course that's speculation on my part, versus your timed experimentation, so could well be that I'm wrong. And I'll need to look into this eventually. Plus I haven't looked at the stat tables the new query optimizer is increasingly using to find the best plan, to put information in there for the cardinality of our vtables and our "selection" tmp-tables, so SQLite has enough info to do its planning. Heck when I'm mixing vtable index costs and real (tmp) table index costs, I have no clue the costs am I returning are compatible. That's an area that's not well covered by the doc IMHO, which I haven't explored enough. So as of now it's possible SQLite would never select a plan that privileges a PK or non-unique index access on the "real" table. In any case, thank you for your persistence and challenging my assumptions. Your experiments are very interesting, and I'll try to report back in this thread any of my own findings in light of the information we've provided. Thanks a bunch Olaf. Cheers, --DD _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

