Re: [sqlite] SELECT becomes very slow when converted to UPDATE
Do I correctly understand the intention of the UPDATE is that for each my_id in meta_table, it will store the count of all the hashes that are associated only with my_id and no other id's? In that case, have you tried: UPDATE meta_table SET distinct_hashes = ( SELECT COUNT(hash) AS num FROM temp.t_distinct_hashes_by_id d WHERE --No need for any NOT NULL, it'll fail equality comparison 1 = (SELECT COUNT(my_id) FROM temp.t_distinct_hashes_by_id sub WHERE d.hash = sub.hash) AND meta_table.my_id = d.my_id ) ; Or you could try (although I doubt it'd be any faster): distinct_hashes = ( SELECT COUNT(*) AS num FROM ( -- Technically not valid SQL, -- but SQLite allows you to select non aggregate rows not include in the GROUP BY, -- and we're safe because of HAVING COUNT(*) = 1. SELECT my_id FROM temp.t_distinct_hashes_by_id GROUP BY Hash HAVING COUNT(*) = 1 ) d WHERE d.my_id = meta_table.my_id ); If my logic is correct both of these should work, and both avoid a SCAN TABLE in your innermost loop. Also, is it possible my_id and hash could be guaranteed NOT NULL in your temp table? I'm not sure if that would speed up the query (although it couldn't hurt), but it would certainly make the problem easier to think about. Can you share your overall goal? It looks like you're doing after like SELECT my_id, COUNT(*) FROM (SELECT my_id FROM GROUP BY hash HAVING COUNT(DISTINCT (hash, my_id)) = 1) GROUP by my_id. Which doesn't work because SQLite doesn't like COUNT(DISTINCT ). On Sat, 23 Jun 2018 at 23:44, Jonathan Moules wrote: > Hi List, > I'm trying to find all hashes that are unique to a specific id (my_id), > and then use a UPDATE-Join to update another table with that number. > > After much tweaking, I've simplified the table down to a basic temp > table (actually created using a CREATE AS SELECT ... GROUP BY my_id, hash): > > CREATE TEMP TABLE t_distinct_hashes_by_id ( > my_idINTEGER, > hashTEXT > ); > > And indexes in both directions because I'm still trying to optimise (and > see what SQLite wants): > > CREATE UNIQUE INDEX temp.idx__1 ON t_distinct_hashes_by_id ( > hash, > my_id > ); > > CREATE UNIQUE INDEX temp.idx__2 ON t_distinct_hashes_by_id ( > my_id, > hash > ); > > There are only 20 values for my_id, but several hundred thousand hashes. > > - > > I can do a SELECT query which gets what I want and runs in about 0.5 > seconds: > > SELECT > * > FROM > temp.t_distinct_hashes_by_id d > WHERE > hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id > sub where 1 != sub.my_id and hash not NULL) > AND > 1 = d.my_id > > The EXPLAIN: > 000SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING > INDEX idx__2 (my_id=?) > 000EXECUTE LIST SUBQUERY 1 > 100SCAN TABLE t_distinct_hashes_by_id AS sub > > - > So in theory, I'd hope that an UPDATE version using this select should > take around 20 * 0.5 sec = 10 seconds. But it's actually taking... well > I don't know how long, at least 10 minutes before I gave up waiting. > This is the UPDATE: > > UPDATE > meta_table > SET > distinct_hashes = ( > SELECT > COUNT(hash) AS num > FROM > temp.t_distinct_hashes_by_id d > WHERE > hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id > sub where meta_table.my_id != sub.my_id and hash not NULL) > AND > -- This one has to be at the bottom for some reason. > meta_table.my_id = d.my_id > ) > ; > > The query plan for this UPDATE includes two CORRELATED Subqueries, which > the docs say are reassessed on every run - that seems like the problem. > I get that it'd need to do that 20 times (once per my_id), but the > slowdown seems considerably longer than that needs to warrant. > > 000SCAN TABLE meta_table > 000EXECUTE CORRELATED SCALAR SUBQUERY 0 > 000SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING > INDEX idx__2 (my_id=?) > 000EXECUTE CORRELATED LIST SUBQUERY 1 > 100SCAN TABLE t_distinct_hashes_by_id AS sub > > Does anyone have any thoughts on how I can speed this up (SQLite 3.15.0 > and confirmed in 3.24.0 (which uses about 3 times the disk IO / sec for > the same query/data))? > Thanks, > Jonathan > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list
Re: [sqlite] LIMIT
I checked with a fake dataset: the GROUP BY solution is quicker with no index on F2 (the query planner can understand that query well enough to create an automatic covering index). However, with an index on F2, the rowid solution is considerably faster for a dataset with 2600 random distinct F2 and one million total rows in T. The test script is at the end of the email. On my computer, with the rowid correlation, the query returned in 14 seconds*. With the GROUP BY solution, the query took 50 seconds. It was interesting to see that both queries ran quickly spitting out values until SQLite had returned all possible values, then stalled for a long time as the DB engine processed all rows that could never return anything. Cheers, - Barry [*I suspected that most of the time with the rowid query was taken in writing to the console, so I SELECT'd COUNT(*) instead of * and it returned in 6 seconds, and I suspect still did most of the work (the query plan was the same). I tried the same thing on the GROUP BY query, but the GROUP BY and COUNT don't play so well together and it still spat out a bunch of numbers to the console.] CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT NOT NULL); CREATE TABLE Names(id INTEGER PRIMARY KEY, Name TEXT NOT NULL); INSERT INTO Names(Name) VALUES ('Alex'),('Brett'),('Cindy'),('Dennis'),('Echo'),('Frank'),('Garry'),('Herbert'),('Indigo'),('Jack'),('Karl'),('Lima'),('Melon'),('Nunes'),('Oprah'),('Peter'),('Quincy'),('Robert'),('Sarah'),('Tangerine'),('Unicorn'),('Violet'),('Wilfred'),('Violet'),('Wesley'),('Xavier'),('Yeltzin'),('Zbrudov'); WITH num(i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM num WHERE i < 100) INSERT INTO Names(Name) SELECT Name || i FROM Names, Num; WITH num(i, rnd) AS (SELECT 1, RANDOM()%1300 + 1300 UNION ALL SELECT i + 1, RANDOM()%1300 + 1300 FROM num WHERE i < 100) INSERT INTO T(F2) SELECT (SELECT Name FROM Names WHERE id = rnd) FROM num; CREATE INDEX idx_F2 ON T(F2); On Sun, 24 Jun 2018 at 07:00, Gert Van Assche wrote: > Ryan, my dataset isn't that big. 11K records. > Your solution is noticeable faster 996 ms vs 13126 ms. > Interesting! > > gert > > Op za 23 jun. 2018 om 18:09 schreef R Smith : > > > > > On 2018/06/23 2:47 PM, Gert Van Assche wrote: > > > Barry, that's even easier indeed. And it works perfect! > > > Thanks for sharing this. > > > > I very much like the method which Barry suggested for the simplicity, > > but have avoided it since I thought it would be significantly slower on > > a large data set, however, SQLite is known for some pretty neat > > optimizations (especially where the row_id is concerned) and I'm now > > interested to know if indeed such optimizations feature in this case. I > > have a few such implemented queries that might benefit from changing > over. > > > > May I ask, if you do have a rather large dataset, and perhaps don't mind > > the time, would you care to compare the two methods and let us know if > > there is any discernible difference in speed? Also one small important > > item - how many F1 items are there typically per unique F2 item in > > total? Does it grow over time? > > > > Also, you do not need hashes, for either of the two queries - the > > queries should work regardless of integer or Unicode text used, perhaps > > the comparison might need a specific collation? Surrounding both > > references with a TRIM() function might also help. Essentially, if F1 of > > record 3 in Table T is '*&^%@#*&^@#' then "... FROM T AS A, T AS B > > WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug. > > The <, <=, >, >= might all produce some Unicode weirdness upon > > inadequate collations. > > > > > > If time doesn't allow, then don't spend effort on this, it's simply a > > curiosity. :) > > > > Thanks! > > Ryan > > > > > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
Ryan, my dataset isn't that big. 11K records. Your solution is noticeable faster 996 ms vs 13126 ms. Interesting! gert Op za 23 jun. 2018 om 18:09 schreef R Smith : > > On 2018/06/23 2:47 PM, Gert Van Assche wrote: > > Barry, that's even easier indeed. And it works perfect! > > Thanks for sharing this. > > I very much like the method which Barry suggested for the simplicity, > but have avoided it since I thought it would be significantly slower on > a large data set, however, SQLite is known for some pretty neat > optimizations (especially where the row_id is concerned) and I'm now > interested to know if indeed such optimizations feature in this case. I > have a few such implemented queries that might benefit from changing over. > > May I ask, if you do have a rather large dataset, and perhaps don't mind > the time, would you care to compare the two methods and let us know if > there is any discernible difference in speed? Also one small important > item - how many F1 items are there typically per unique F2 item in > total? Does it grow over time? > > Also, you do not need hashes, for either of the two queries - the > queries should work regardless of integer or Unicode text used, perhaps > the comparison might need a specific collation? Surrounding both > references with a TRIM() function might also help. Essentially, if F1 of > record 3 in Table T is '*&^%@#*&^@#' then "... FROM T AS A, T AS B > WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug. > The <, <=, >, >= might all produce some Unicode weirdness upon > inadequate collations. > > > If time doesn't allow, then don't spend effort on this, it's simply a > curiosity. :) > > Thanks! > Ryan > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
On 23 Jun 2018, at 5:08pm, R Smith wrote: > May I ask, if you do have a rather large dataset, and perhaps don't mind the > time, would you care to compare the two methods and let us know if there is > any discernible difference in speed? Answers will be different depending on type of main storage, size of cache available, and how much of the computer's processors are devoted to SQLite. I used to have an overnight run (not using SQLite) on an early dual-processor machine which took three times longer to work if a tiny clock was showing on the display. The clock hogged one of the processors. It would be an interesting comparison, but it's valid only for the setup it's run on. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
On 2018/06/23 2:47 PM, Gert Van Assche wrote: Barry, that's even easier indeed. And it works perfect! Thanks for sharing this. I very much like the method which Barry suggested for the simplicity, but have avoided it since I thought it would be significantly slower on a large data set, however, SQLite is known for some pretty neat optimizations (especially where the row_id is concerned) and I'm now interested to know if indeed such optimizations feature in this case. I have a few such implemented queries that might benefit from changing over. May I ask, if you do have a rather large dataset, and perhaps don't mind the time, would you care to compare the two methods and let us know if there is any discernible difference in speed? Also one small important item - how many F1 items are there typically per unique F2 item in total? Does it grow over time? Also, you do not need hashes, for either of the two queries - the queries should work regardless of integer or Unicode text used, perhaps the comparison might need a specific collation? Surrounding both references with a TRIM() function might also help. Essentially, if F1 of record 3 in Table T is '*&^%@#*&^@#' then "... FROM T AS A, T AS B WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug. The <, <=, >, >= might all produce some Unicode weirdness upon inadequate collations. If time doesn't allow, then don't spend effort on this, it's simply a curiosity. :) Thanks! Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SELECT becomes very slow when converted to UPDATE
Hi List, I'm trying to find all hashes that are unique to a specific id (my_id), and then use a UPDATE-Join to update another table with that number. After much tweaking, I've simplified the table down to a basic temp table (actually created using a CREATE AS SELECT ... GROUP BY my_id, hash): CREATE TEMP TABLE t_distinct_hashes_by_id ( my_id INTEGER, hash TEXT ); And indexes in both directions because I'm still trying to optimise (and see what SQLite wants): CREATE UNIQUE INDEX temp.idx__1 ON t_distinct_hashes_by_id ( hash, my_id ); CREATE UNIQUE INDEX temp.idx__2 ON t_distinct_hashes_by_id ( my_id, hash ); There are only 20 values for my_id, but several hundred thousand hashes. - I can do a SELECT query which gets what I want and runs in about 0.5 seconds: SELECT * FROM temp.t_distinct_hashes_by_id d WHERE hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id sub where 1 != sub.my_id and hash not NULL) AND 1 = d.my_id The EXPLAIN: 0 0 0 SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING INDEX idx__2 (my_id=?) 0 0 0 EXECUTE LIST SUBQUERY 1 1 0 0 SCAN TABLE t_distinct_hashes_by_id AS sub - So in theory, I'd hope that an UPDATE version using this select should take around 20 * 0.5 sec = 10 seconds. But it's actually taking... well I don't know how long, at least 10 minutes before I gave up waiting. This is the UPDATE: UPDATE meta_table SET distinct_hashes = ( SELECT COUNT(hash) AS num FROM temp.t_distinct_hashes_by_id d WHERE hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id sub where meta_table.my_id != sub.my_id and hash not NULL) AND -- This one has to be at the bottom for some reason. meta_table.my_id = d.my_id ) ; The query plan for this UPDATE includes two CORRELATED Subqueries, which the docs say are reassessed on every run - that seems like the problem. I get that it'd need to do that 20 times (once per my_id), but the slowdown seems considerably longer than that needs to warrant. 0 0 0 SCAN TABLE meta_table 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 0 0 0 0 SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING INDEX idx__2 (my_id=?) 0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1 1 0 0 SCAN TABLE t_distinct_hashes_by_id AS sub Does anyone have any thoughts on how I can speed this up (SQLite 3.15.0 and confirmed in 3.24.0 (which uses about 3 times the disk IO / sec for the same query/data))? Thanks, Jonathan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
Barry, that's even easier indeed. And it works perfect! Thanks for sharing this. gert Op za 23 jun. 2018 om 14:32 schreef Barry Smith : > Ryan's way works well. Here is a second method which expresses it in a > different way: > > SELECT * FROM T t1 where rowid IN (SELECT rowid FROM T t2 WHERE t1.F2 = > t2.F2 ORDER BY rowid LIMIT 10) > > If you have WITHOUT ROWID tables you'd have to replace rowid with your > primary key. > > (The query may still work without the ORDER BY, I didn't test it, but even > if it does a future query optimiser might break that because without the > order by the results of the inner select are free to include a different 10 > rowids for every value in the outer query) > > > On 23 Jun 2018, at 9:50 pm, Gert Van Assche wrote: > > > > Hi Ryan, thanks for this. This is working if the F1 field is a numeric > > value. With text (especially Asian & Arabic characters) this does not > seem > > to work. > > So I created an MD5 hash from the text fields and it works great! Thank > you > > so much. > > > > gert > > > > Op vr 22 jun. 2018 om 22:52 schreef R Smith : > > > >> > >>> On 2018/06/22 10:04 PM, Gert Van Assche wrote: > >>> All, > >>> > >>> I'm sure it must be possible, I just don't find how. > >>> I have a table T with 2 fields (F1 and F2). The F1 are unique and the > F2 > >>> are not unique. > >>> I would like to get only 10 F1 fields for each unique F2. > >> > >> This is not normally done, and windowing functions in other RDBMSes > >> makes for an easier way, but it can still be done in SQLite with some > >> creative grouping of a self-joined query. > >> In this example, I limited it to 3 F1 items per unique F2 for brevity, > >> but you can easily change the "< 4" to "< 11" or "<= 10" according to > >> preference. > >> > >> -- SQLite version 3.20.1 [ Release: 2017-08-24 ] on SQLitespeed > >> version 2.0.2.4. > >> -- > >> > >> > > >> > >> CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT); > >> > >> INSERT INTO T(F1,F2) VALUES > >> (1, 'John') > >> ,(2, 'John') > >> ,(3, 'Jason') > >> ,(4, 'John') > >> ,(5, 'Jason') > >> ,(6, 'John') > >> ,(7, 'John') > >> ,(8, 'Jason') > >> ,(9, 'Jason') > >> ,(10, 'Joan') > >> ,(11, 'Joan') > >> ,(12, 'Joan') > >> ,(13, 'Jimmy') > >> ; > >> > >> SELECT A.F2, B.F1 > >> FROM T AS A > >> JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1 > >> GROUP BY A.F2, B.F1 > >> HAVING COUNT(*) < 4 > >> ; > >> > >> -- F2| F1 > >> -- - | --- > >> -- Jason | 3 > >> -- Jason | 5 > >> -- Jason | 8 > >> -- Jimmy | 13 > >> -- Joan | 10 > >> -- Joan | 11 > >> -- Joan | 12 > >> -- John | 1 > >> -- John | 2 > >> -- John | 4 > >> > >> > >> > >> -- Another option to note, in case the 10 limit is not important and > >> simply aimed > >> -- at saving space, is to use group concatenation, like so: > >> > >> SELECT F2, group_concat(F1)AS F1 > >> FROM T > >> GROUP BY F2 > >> ; > >> > >> -- | > >> -- F2|F1 > >> -- - | - > >> -- Jason | 3,5,8,9 > >> -- Jimmy | 13 > >> -- Joan | 10,11,12 > >> -- John | 1,2,4,6,7 > >> > >> > >> -- > >> > >> > > >> > >> Cheers! > >> Ryan > >> > >> > >> ___ > >> sqlite-users mailing list > >> sqlite-users@mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
Ryan's way works well. Here is a second method which expresses it in a different way: SELECT * FROM T t1 where rowid IN (SELECT rowid FROM T t2 WHERE t1.F2 = t2.F2 ORDER BY rowid LIMIT 10) If you have WITHOUT ROWID tables you'd have to replace rowid with your primary key. (The query may still work without the ORDER BY, I didn't test it, but even if it does a future query optimiser might break that because without the order by the results of the inner select are free to include a different 10 rowids for every value in the outer query) > On 23 Jun 2018, at 9:50 pm, Gert Van Assche wrote: > > Hi Ryan, thanks for this. This is working if the F1 field is a numeric > value. With text (especially Asian & Arabic characters) this does not seem > to work. > So I created an MD5 hash from the text fields and it works great! Thank you > so much. > > gert > > Op vr 22 jun. 2018 om 22:52 schreef R Smith : > >> >>> On 2018/06/22 10:04 PM, Gert Van Assche wrote: >>> All, >>> >>> I'm sure it must be possible, I just don't find how. >>> I have a table T with 2 fields (F1 and F2). The F1 are unique and the F2 >>> are not unique. >>> I would like to get only 10 F1 fields for each unique F2. >> >> This is not normally done, and windowing functions in other RDBMSes >> makes for an easier way, but it can still be done in SQLite with some >> creative grouping of a self-joined query. >> In this example, I limited it to 3 F1 items per unique F2 for brevity, >> but you can easily change the "< 4" to "< 11" or "<= 10" according to >> preference. >> >> -- SQLite version 3.20.1 [ Release: 2017-08-24 ] on SQLitespeed >> version 2.0.2.4. >> -- >> >> >> >> CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT); >> >> INSERT INTO T(F1,F2) VALUES >> (1, 'John') >> ,(2, 'John') >> ,(3, 'Jason') >> ,(4, 'John') >> ,(5, 'Jason') >> ,(6, 'John') >> ,(7, 'John') >> ,(8, 'Jason') >> ,(9, 'Jason') >> ,(10, 'Joan') >> ,(11, 'Joan') >> ,(12, 'Joan') >> ,(13, 'Jimmy') >> ; >> >> SELECT A.F2, B.F1 >> FROM T AS A >> JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1 >> GROUP BY A.F2, B.F1 >> HAVING COUNT(*) < 4 >> ; >> >> -- F2| F1 >> -- - | --- >> -- Jason | 3 >> -- Jason | 5 >> -- Jason | 8 >> -- Jimmy | 13 >> -- Joan | 10 >> -- Joan | 11 >> -- Joan | 12 >> -- John | 1 >> -- John | 2 >> -- John | 4 >> >> >> >> -- Another option to note, in case the 10 limit is not important and >> simply aimed >> -- at saving space, is to use group concatenation, like so: >> >> SELECT F2, group_concat(F1)AS F1 >> FROM T >> GROUP BY F2 >> ; >> >> -- | >> -- F2|F1 >> -- - | - >> -- Jason | 3,5,8,9 >> -- Jimmy | 13 >> -- Joan | 10,11,12 >> -- John | 1,2,4,6,7 >> >> >> -- >> >> >> >> Cheers! >> Ryan >> >> >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] LIMIT
Hi Ryan, thanks for this. This is working if the F1 field is a numeric value. With text (especially Asian & Arabic characters) this does not seem to work. So I created an MD5 hash from the text fields and it works great! Thank you so much. gert Op vr 22 jun. 2018 om 22:52 schreef R Smith : > > On 2018/06/22 10:04 PM, Gert Van Assche wrote: > > All, > > > > I'm sure it must be possible, I just don't find how. > > I have a table T with 2 fields (F1 and F2). The F1 are unique and the F2 > > are not unique. > > I would like to get only 10 F1 fields for each unique F2. > > This is not normally done, and windowing functions in other RDBMSes > makes for an easier way, but it can still be done in SQLite with some > creative grouping of a self-joined query. > In this example, I limited it to 3 F1 items per unique F2 for brevity, > but you can easily change the "< 4" to "< 11" or "<= 10" according to > preference. > >-- SQLite version 3.20.1 [ Release: 2017-08-24 ] on SQLitespeed > version 2.0.2.4. >-- > > > > CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT); > > INSERT INTO T(F1,F2) VALUES > (1, 'John') > ,(2, 'John') > ,(3, 'Jason') > ,(4, 'John') > ,(5, 'Jason') > ,(6, 'John') > ,(7, 'John') > ,(8, 'Jason') > ,(9, 'Jason') > ,(10, 'Joan') > ,(11, 'Joan') > ,(12, 'Joan') > ,(13, 'Jimmy') > ; > > SELECT A.F2, B.F1 >FROM T AS A >JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1 > GROUP BY A.F2, B.F1 > HAVING COUNT(*) < 4 > ; > >-- F2| F1 >-- - | --- >-- Jason | 3 >-- Jason | 5 >-- Jason | 8 >-- Jimmy | 13 >-- Joan | 10 >-- Joan | 11 >-- Joan | 12 >-- John | 1 >-- John | 2 >-- John | 4 > > > > -- Another option to note, in case the 10 limit is not important and > simply aimed > -- at saving space, is to use group concatenation, like so: > > SELECT F2, group_concat(F1)AS F1 >FROM T > GROUP BY F2 > ; > >-- | >-- F2|F1 >-- - | - >-- Jason | 3,5,8,9 >-- Jimmy | 13 >-- Joan | 10,11,12 >-- John | 1,2,4,6,7 > > >-- > > > > Cheers! > Ryan > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users