Your made up plans are intriguing. The plan you show for the latter query omit to join a and b. Are you just making things up?
sqlite> select a.rowid from a, b where a.ref=7 and a.rowid in (select rowid from b); QUERY PLAN |--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows) |--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR `--SCAN TABLE b (~1048576 rows) However, assuming that you typo'd the query (please learn about this new-fangled thing called cut-n-paste to avoid that error in the future) you get this if you specify "FROM a" rather than "FROM a, b". sqlite> select rowid from a where ref=7 and rowid in (select rowid from b); QUERY PLAN |--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows) `--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR In any case, upon closer examination the VDBE code for the two queries "solves" the query quite differently: select a.rowid from a, b where a.rowid == b.rowid and a.ref == ? places the index constrained by ref = ? in the outer loop, and then if a.rowid exists in table b, returns a result row. This means that the number of times the outer loop is executed is dependent on the constraint ref == ? on the index. Since there can be no possible statistic for the a.rowid == b.rowid condition, table a (or rather the index) will and must always be in the outer loop, even if the constraint ref == ? selects all rows in table a and there is only one row in table b ... select rowid from a where ref == ? and rowid in (select rowid from b) however scans table b in the outer loop and then does an index lookup on (ref, rowid) into the index and returns a result row whenever it is found. This means that the number of times the outer loop is executed is dependent on the number of rows in table b (only). Since the QP can determine the number of rows in a where ref == 7 and the number of rows in b, the plan will probably be optimized by having statistics available. Do you have statistics available? Or do you just by happenstance have less rows in b than in a constrained by ref == ?. Do you have statistics? Have you run ANALYZE? --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Poor Yorick >Sent: Wednesday, 17 April, 2019 08:48 >To: SQLite mailing list >Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect > >On Wed, Apr 17, 2019 at 01:24:11PM +0000, David Raymond wrote: >> Would you post what those explain query plans results are? All the >other replies not withstanding I'm still curious as to why #2 would >be faster (assuming "rowid" is indeed the actual rowid anyway) >> >> Also, is that a typo in #2, if you're not using b, why would you >include it in the from clause? Wouldn't that introduce a whole bunch >of duplicates? As in a copy of a.rowid for every single record in b? >(Maybe my brain just hasn't finished warming up this morning) >> >> #1 >> select a.rowid >> from a join b on a.rowid = b.rowid >> where a.ref = $x >> >> #2 >> select a.rowid >> from a,b >> where a.ref = $x and a.rowid in (select rowid from b) >> >> > >3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)} >8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)} > >2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND >rowid=?)} >7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} > >-- >Poor Yorick >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users