I checked with a fake dataset: the GROUP BY solution is quicker with no
index on F2 (the query planner can understand that query well enough to
create an automatic covering index). However, with an index on F2, the
rowid solution is considerably faster for a dataset with 2600 random
distinct F2 and one million total rows in T.

The test script is at the end of the email. On my computer, with the rowid
correlation, the query returned in 14 seconds*. With the GROUP BY solution,
the query took 50 seconds.

It was interesting to see that both queries ran quickly spitting out values
until SQLite had returned all possible values, then stalled for a long time
as the DB engine processed all rows that could never return anything.

Cheers,

 - Barry

[*I suspected that most of the time with the rowid query was taken in
writing to the console, so I SELECT'd COUNT(*) instead of * and it returned
in 6 seconds, and I suspect still did most of the work (the query plan was
the same). I tried the same thing on the GROUP BY query, but the GROUP BY
and COUNT don't play so well together and it still spat out a bunch of
numbers to the console.]

CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT NOT NULL);

CREATE TABLE Names(id INTEGER PRIMARY KEY, Name TEXT NOT NULL);

INSERT INTO Names(Name) VALUES
('Alex'),('Brett'),('Cindy'),('Dennis'),('Echo'),('Frank'),('Garry'),('Herbert'),('Indigo'),('Jack'),('Karl'),('Lima'),('Melon'),('Nunes'),('Oprah'),('Peter'),('Quincy'),('Robert'),('Sarah'),('Tangerine'),('Unicorn'),('Violet'),('Wilfred'),('Violet'),('Wesley'),('Xavier'),('Yeltzin'),('Zbrudov');

WITH num(i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM num WHERE i < 100)
INSERT INTO Names(Name)
SELECT Name || i FROM Names, Num;

WITH num(i, rnd) AS (SELECT 1, RANDOM()%1300 + 1300 UNION ALL SELECT i + 1,
RANDOM()%1300 + 1300 FROM num WHERE i < 1000000)
INSERT INTO T(F2)
SELECT (SELECT Name FROM Names WHERE id = rnd) FROM num;

CREATE INDEX idx_F2 ON T(F2);

On Sun, 24 Jun 2018 at 07:00, Gert Van Assche <ger...@gmail.com> wrote:

> Ryan, my dataset isn't that big. 11K records.
> Your solution is noticeable faster 996 ms vs 13126 ms.
> Interesting!
>
> gert
>
> Op za 23 jun. 2018 om 18:09 schreef R Smith <ryansmit...@gmail.com>:
>
> >
> > On 2018/06/23 2:47 PM, Gert Van Assche wrote:
> > > Barry, that's even easier indeed. And it works perfect!
> > > Thanks for sharing this.
> >
> > I very much like the method which Barry suggested for the simplicity,
> > but have avoided it since I thought it would be significantly slower on
> > a large data set, however, SQLite is known for some pretty neat
> > optimizations (especially where the row_id is concerned) and I'm now
> > interested to know if indeed such optimizations feature in this case. I
> > have a few such implemented queries that might benefit from changing
> over.
> >
> > May I ask, if you do have a rather large dataset, and perhaps don't mind
> > the time, would you care to compare the two methods and let us know if
> > there is any discernible difference in speed? Also one small important
> > item - how many F1 items are there typically per unique F2 item in
> > total? Does it grow over time?
> >
> > Also, you do not need hashes, for either of the two queries - the
> > queries should work regardless of integer or Unicode text used, perhaps
> > the comparison might need a specific collation? Surrounding both
> > references with a TRIM() function might also help. Essentially, if F1 of
> > record 3 in Table T is '*&^%@#*&^@#' then   "... FROM T AS A, T AS B
> > WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug.
> > The <, <=, >, >= might all produce some Unicode weirdness upon
> > inadequate collations.
> >
> >
> > If time doesn't allow, then don't spend effort on this, it's simply a
> > curiosity. :)
> >
> > Thanks!
> > Ryan
> >
> >
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to