I checked with a fake dataset: the GROUP BY solution is quicker with no index on F2 (the query planner can understand that query well enough to create an automatic covering index). However, with an index on F2, the rowid solution is considerably faster for a dataset with 2600 random distinct F2 and one million total rows in T.
The test script is at the end of the email. On my computer, with the rowid correlation, the query returned in 14 seconds*. With the GROUP BY solution, the query took 50 seconds. It was interesting to see that both queries ran quickly spitting out values until SQLite had returned all possible values, then stalled for a long time as the DB engine processed all rows that could never return anything. Cheers, - Barry [*I suspected that most of the time with the rowid query was taken in writing to the console, so I SELECT'd COUNT(*) instead of * and it returned in 6 seconds, and I suspect still did most of the work (the query plan was the same). I tried the same thing on the GROUP BY query, but the GROUP BY and COUNT don't play so well together and it still spat out a bunch of numbers to the console.] CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT NOT NULL); CREATE TABLE Names(id INTEGER PRIMARY KEY, Name TEXT NOT NULL); INSERT INTO Names(Name) VALUES ('Alex'),('Brett'),('Cindy'),('Dennis'),('Echo'),('Frank'),('Garry'),('Herbert'),('Indigo'),('Jack'),('Karl'),('Lima'),('Melon'),('Nunes'),('Oprah'),('Peter'),('Quincy'),('Robert'),('Sarah'),('Tangerine'),('Unicorn'),('Violet'),('Wilfred'),('Violet'),('Wesley'),('Xavier'),('Yeltzin'),('Zbrudov'); WITH num(i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM num WHERE i < 100) INSERT INTO Names(Name) SELECT Name || i FROM Names, Num; WITH num(i, rnd) AS (SELECT 1, RANDOM()%1300 + 1300 UNION ALL SELECT i + 1, RANDOM()%1300 + 1300 FROM num WHERE i < 1000000) INSERT INTO T(F2) SELECT (SELECT Name FROM Names WHERE id = rnd) FROM num; CREATE INDEX idx_F2 ON T(F2); On Sun, 24 Jun 2018 at 07:00, Gert Van Assche <ger...@gmail.com> wrote: > Ryan, my dataset isn't that big. 11K records. > Your solution is noticeable faster 996 ms vs 13126 ms. > Interesting! > > gert > > Op za 23 jun. 2018 om 18:09 schreef R Smith <ryansmit...@gmail.com>: > > > > > On 2018/06/23 2:47 PM, Gert Van Assche wrote: > > > Barry, that's even easier indeed. And it works perfect! > > > Thanks for sharing this. > > > > I very much like the method which Barry suggested for the simplicity, > > but have avoided it since I thought it would be significantly slower on > > a large data set, however, SQLite is known for some pretty neat > > optimizations (especially where the row_id is concerned) and I'm now > > interested to know if indeed such optimizations feature in this case. I > > have a few such implemented queries that might benefit from changing > over. > > > > May I ask, if you do have a rather large dataset, and perhaps don't mind > > the time, would you care to compare the two methods and let us know if > > there is any discernible difference in speed? Also one small important > > item - how many F1 items are there typically per unique F2 item in > > total? Does it grow over time? > > > > Also, you do not need hashes, for either of the two queries - the > > queries should work regardless of integer or Unicode text used, perhaps > > the comparison might need a specific collation? Surrounding both > > references with a TRIM() function might also help. Essentially, if F1 of > > record 3 in Table T is '*&^%@#*&^@#' then "... FROM T AS A, T AS B > > WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug. > > The <, <=, >, >= might all produce some Unicode weirdness upon > > inadequate collations. > > > > > > If time doesn't allow, then don't spend effort on this, it's simply a > > curiosity. :) > > > > Thanks! > > Ryan > > > > > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users