Thanks to all who have replied, very informative! :) This is just a database for own personal use so it's not a big deal in any way, mainly trying to get a better understanding of how Sqlite works here.
I'll note that the sql queries are not static inside my application but they are generated dynamically from command line arguments. Basically any column can be added to the SELECT, WHERE and ORDER BY clauses at runtime, and some columns will be generated from sub-queries (via JOIN:s) as shown in first post. As the correlated sub-queries will be executed twice if used in the WHERE clause it seems that using JOIN:s is preferable for my use cases. Some further experimentation shows that using LEFT JOIN for the subqueries instead of INNER JOIN will always make the query planner do the "right" thing, i.e. use automatic indexes instead of table scans. Regardless of ANALYZE information being present or not. So that is maybe a better work-around than removing the ANALYZE tables. LEFT JOIN and INNER JOIN will always return the same results in this case as all books will (or should) have dates, genres and authors, and if they do not then I probably want LEFT JOIN semantics anyway to better notice it. I currently use INNER JOIN to give the query planner more freedom in selecting query plans. Still would be interesting to know why Sqlite went for plain table scans in the initial case. Seems that using automatic indexes will always be faster (N*logN vs N*N), so why not always use them when possible? Acccording to the documentation Sqlite assumes N is a million without analyze information, and in that case it opted to generate automatic indexes. In my case with ANALYZE information present N will be around 3000, and then it opted for table scans. The final query took over 24 minutes with all 3 sub-query columns present when run to completion though, so obviously the wrong choice since the loops ended up being nested three or more times. (I understand that the query planner must take many different scenarios into account and generate good plans for all of them, so this is most likely an unfortunate edge case.) Also noticed another case where Sqlite uses nested table scanning for JOIN:s and this time it was not instead of automatic indexes, apparently it opted for nested scans to avoid using a temp b-tree in the ORDER BY. (This is part of co-routine for a window function using the AuthorID for partitions.) Slow nested table scan (execution time measured in seconds): | | |--SCAN TABLE Authors | | |--SCAN TABLE DatesRead | | |--SEARCH TABLE AuthorBooks USING PRIMARY KEY (BookID=? AND AuthorID=?) | | |--SEARCH SUBQUERY 1 ... | | `--SEARCH SUBQUERY 2 ... vs temp b-tree (execution time measured in milliseconds): | | |--SCAN TABLE AuthorBooks | | |--SEARCH TABLE Authors USING INTEGER PRIMARY KEY (rowid=?) | | |--SEARCH TABLE DatesRead USING PRIMARY KEY (BookID=?) | | |--SEARCH SUBQUERY 1 ... | | |--SEARCH SUBQUERY 2 ... | | `--USE TEMP B-TREE FOR ORDER BY This is part of a larger query but I have not included all tables used in it so just show parts that differ here, everything else in the two queries is identical. Dropping the ANALYZE information makes the query planner select the faster alternative here as well. I can provide more information about the query in case anyone is interested, just included these parts now to illustrate the "problematic" nested scans. (Not really that problematic, this just came up in a test that iterated over all supported columns for all main queries, in actual use of the application I would hardly run it, but still an interesting case I think.) -- Sent from: http://sqlite.1065341.n5.nabble.com/ _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users