When I read this, it seemed like it made sense. The thing is, it does not match up with reality.
First, the analysis of what happens when I pipe the results to 'sort' misses the fact that the sort process executes within the 31 minutes of that version. It would not make a dent in the time of the slow version. But the big thing is that I took a look at EXPLAIN QUERY PLAN using this script: #!/usr/bin/env python3 """Output positions that are reachable but unsolved at census 18 See page 76 of Qubic log Last Modified: Thu Feb 2 07:46:03 PST 2017 """ import sqlite3 # https://docs.python.org/3.5/library/sqlite3.html with sqlite3.connect("917.db") as conn: print("BEOFRE ANALYZE") for row in conn.execute(""" EXPLAIN SELECT DISTINCT ppos FROM move JOIN pos ON mto = pnum WHERE pcensus = 18 and pmin < pmax """): print(row) print() print() conn.execute("ANALYZE") print("AFTER ANALYZE") for row in conn.execute(""" EXPLAIN SELECT DISTINCT ppos FROM move JOIN pos ON mto = pnum WHERE pcensus = 18 and pmin < pmax """): print(row) and after waiting most of the day for the analyze to finish, I got two identical query plans, neither of which I could decipher: BEFORE ANALYZE (0, 'Init', 0, 25, 0, '', '00', None) (1, 'Null', 1, 7, 0, '', '08', None) (2, 'OpenRead', 1, 4, 0, '7', '00', None) (3, 'OpenRead', 3, 6, 0, 'k(1,)', '00', None) (4, 'OpenRead', 4, 11, 0, 'k(3,,,)', '02', None) (5, 'Rewind', 3, 21, 1, '0', '00', None) (6, 'Seek', 3, 0, 1, '', '00', None) (7, 'Column', 1, 2, 1, '', '00', None) (8, 'Ne', 2, 20, 1, '(BINARY)', '54', None) (9, 'Column', 1, 5, 3, '-99', '00', None) (10, 'Column', 1, 6, 4, '99', '00', None) (11, 'Ge', 4, 20, 3, '(BINARY)', '53', None) (12, 'IdxRowid', 3, 5, 0, '', '00', None) (13, 'SeekGE', 4, 20, 5, '1', '00', None) (14, 'IdxGT', 4, 20, 5, '1', '00', None) (15, 'Column', 3, 0, 6, '', '00', None) (16, 'Eq', 6, 19, 7, '(BINARY)', '80', None) (17, 'Copy', 6, 7, 0, '', '00', None) (18, 'ResultRow', 6, 1, 0, '', '00', None) (19, 'Next', 4, 14, 0, '', '00', None) (20, 'Next', 3, 6, 0, '', '01', None) (21, 'Close', 1, 0, 0, '', '00', None) (22, 'Close', 3, 0, 0, '', '00', None) (23, 'Close', 4, 0, 0, '', '00', None) (24, 'Halt', 0, 0, 0, '', '00', None) (25, 'Transaction', 0, 0, 155, '0', '01', None) (26, 'TableLock', 0, 4, 0, 'pos', '00', None) (27, 'TableLock', 0, 7, 0, 'move', '00', None) (28, 'Integer', 18, 2, 0, '', '00', None) (29, 'Goto', 0, 1, 0, '', '00', None) AFTER ANALYZE (0, 'Init', 0, 25, 0, '', '00', None) (1, 'Null', 1, 7, 0, '', '08', None) (2, 'OpenRead', 1, 4, 0, '7', '00', None) (3, 'OpenRead', 3, 6, 0, 'k(1,)', '00', None) (4, 'OpenRead', 4, 11, 0, 'k(3,,,)', '02', None) (5, 'Rewind', 3, 21, 1, '0', '00', None) (6, 'Seek', 3, 0, 1, '', '00', None) (7, 'Column', 1, 2, 1, '', '00', None) (8, 'Ne', 2, 20, 1, '(BINARY)', '54', None) (9, 'Column', 1, 5, 3, '-99', '00', None) (10, 'Column', 1, 6, 4, '99', '00', None) (11, 'Ge', 4, 20, 3, '(BINARY)', '53', None) (12, 'IdxRowid', 3, 5, 0, '', '00', None) (13, 'SeekGE', 4, 20, 5, '1', '00', None) (14, 'IdxGT', 4, 20, 5, '1', '00', None) (15, 'Column', 3, 0, 6, '', '00', None) (16, 'Eq', 6, 19, 7, '(BINARY)', '80', None) (17, 'Copy', 6, 7, 0, '', '00', None) (18, 'ResultRow', 6, 1, 0, '', '00', None) (19, 'Next', 4, 14, 0, '', '00', None) (20, 'Next', 3, 6, 0, '', '01', None) (21, 'Close', 1, 0, 0, '', '00', None) (22, 'Close', 3, 0, 0, '', '00', None) (23, 'Close', 4, 0, 0, '', '00', None) (24, 'Halt', 0, 0, 0, '', '00', None) (25, 'Transaction', 0, 0, 155, '0', '01', None) (26, 'TableLock', 0, 4, 0, 'pos', '00', None) (27, 'TableLock', 0, 7, 0, 'move', '00', None) (28, 'Integer', 18, 2, 0, '', '00', None) (29, 'Goto', 0, 1, 0, '', '00', None) Maybe somebody can explain them to me, but it doesn't really matter whether I ever understand them. Perhaps Mr. Hipp can make use of them. Absent some flaw in the above script, I think I'm done with this. I have a solution that works for me, and I'd just as soon get back to my real task. I just wanted to give feedback in case it would be useful. That's how i say thanks for a really useful product. Thanks. Mr. Hipp, and anyone else that has contributed to this product. ++ kevin On Thu, Feb 2, 2017 at 12:27 AM, Hick Gunter <h...@scigames.at> wrote: > DISTINCT forces the query optimizer to create an intermediate table to > hold the results and compare each row of the non-distinct result set with > an automatically created index. It may also affect the query plan in a way > that chooses inefficient indices, which is more likely if you have not run > ANALYZE on the fully loaded database. > > Using a 3 stage pipe instead you additionally have more CPUs (1 running > the query, 1 or more sorting the results) working in paralell. > > Try EXPLAIN QUERY PLAN to see what the query planner is doing. > > -----Ursprüngliche Nachricht----- > Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] > Im Auftrag von Kevin O'Gorman > Gesendet: Donnerstag, 02. Februar 2017 03:28 > An: sqlite-users <sqlite-users@mailinglists.sqlite.org> > Betreff: [sqlite] "DISTINCT" makes a query take 37 times as long > > I have a database of positions and moves in a strategic game, and I'm > searching for unsolved positions that have been connected to an immediate > ancestor. I'm using Python 3.5.2, and the code looks like > > #!/usr/bin/env python3 > """Output positions that are reachable but unsolved at census 18 or > greater See page 76 of Qubic log > > Last Modified: Tue Jan 31 12:13:07 PST 2017 """ > > import sqlite3 # https://docs.python.org/3.5/library/sqlite3.html > > with sqlite3.connect("917.db") as conn: > for row in conn.execute(""" > SELECT DISTINCT ppos > FROM move JOIN pos ON mto = pnum > WHERE pcensus = 18 and pmin < pmax > """): > print(row[0]) > > As written here, this query runs for 1193 minutes (just short of 20 > hours). If I remove the "DISTINCT" and instead pipe the result into the > sort program that comes with Linux "sort --unique" the query and sort takes > only 31 minutes. The results are the same, and consist of 4.2 million rows. > > This seems extreme. > > -- > word of the year: *kakistocracy* > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___________________________________________ > Gunter Hick > Software Engineer > Scientific Games International GmbH > FN 157284 a, HG Wien > Klitschgasse 2-4, A-1130 Vienna, Austria > Tel: +43 1 80100 0 > E-Mail: h...@scigames.at > > This communication (including any attachments) is intended for the use of > the intended recipient(s) only and may contain information that is > confidential, privileged or legally protected. Any unauthorized use or > dissemination of this communication is strictly prohibited. If you have > received this communication in error, please immediately notify the sender > by return e-mail message and delete all copies of the original > communication. Thank you for your cooperation. > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- word of the year: *kakistocracy* _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users