Re: [sqlite] SELECT becomes very slow when converted to UPDATE

2018-06-23 Thread Barry
Do I correctly understand the intention of the UPDATE is that for
each my_id in meta_table, it will store the count of all the hashes that
are associated only with my_id and no other id's?

In that case, have you tried:

UPDATE
 meta_table
SET
 distinct_hashes = (
 SELECT
 COUNT(hash) AS num
 FROM
 temp.t_distinct_hashes_by_id d
 WHERE
 --No need for any NOT NULL, it'll fail equality comparison
 1 = (SELECT COUNT(my_id) FROM  temp.t_distinct_hashes_by_id sub
WHERE d.hash = sub.hash)
 AND
 meta_table.my_id = d.my_id
 )
;

Or you could try (although I doubt it'd be any faster):

distinct_hashes = (
 SELECT
 COUNT(*) AS num
  FROM (
 -- Technically not valid SQL,
 -- but SQLite allows you to select non aggregate rows not
include in the GROUP BY,
 -- and we're safe because of HAVING COUNT(*) = 1.
 SELECT my_id
 FROM
temp.t_distinct_hashes_by_id
 GROUP BY Hash
 HAVING COUNT(*) = 1
  ) d
  WHERE d.my_id = meta_table.my_id
 );

If my logic is correct both of these should work, and both avoid a SCAN
TABLE in your innermost loop.

Also, is it possible my_id and hash could be guaranteed NOT NULL in your
temp table? I'm not sure if that would speed up the query (although it
couldn't hurt), but it would certainly make the problem easier to think
about.

Can you share your overall goal? It looks like you're doing after like
SELECT my_id, COUNT(*) FROM (SELECT my_id FROM  GROUP BY hash
HAVING COUNT(DISTINCT (hash, my_id)) = 1) GROUP by my_id.
Which doesn't work because SQLite doesn't like COUNT(DISTINCT ).

On Sat, 23 Jun 2018 at 23:44, Jonathan Moules 
wrote:

> Hi List,
> I'm trying to find all hashes that are unique to a specific id (my_id),
> and then use a UPDATE-Join to update another table with that number.
>
> After much tweaking, I've simplified the table down to a basic temp
> table (actually created using a CREATE AS SELECT ... GROUP BY my_id, hash):
>
> CREATE TEMP TABLE t_distinct_hashes_by_id (
>  my_idINTEGER,
>  hashTEXT
> );
>
> And indexes in both directions because I'm still trying to optimise (and
> see what SQLite wants):
>
> CREATE UNIQUE INDEX temp.idx__1 ON t_distinct_hashes_by_id (
>  hash,
>   my_id
> );
>
> CREATE UNIQUE INDEX temp.idx__2 ON t_distinct_hashes_by_id (
>  my_id,
>  hash
> );
>
> There are only 20 values for my_id, but several hundred thousand hashes.
>
> -
>
> I can do a SELECT query which gets what I want and runs in about 0.5
> seconds:
>
> SELECT
> *
>  FROM
>  temp.t_distinct_hashes_by_id d
>  WHERE
>  hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id
> sub where 1 != sub.my_id and hash not NULL)
>  AND
>  1 = d.my_id
>
> The EXPLAIN:
> 000SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING
> INDEX idx__2 (my_id=?)
> 000EXECUTE LIST SUBQUERY 1
> 100SCAN TABLE t_distinct_hashes_by_id AS sub
>
> -
> So in theory, I'd hope that an UPDATE version using this select should
> take around 20 * 0.5 sec = 10 seconds. But it's actually taking... well
> I don't know how long, at least 10 minutes before I gave up waiting.
> This is the UPDATE:
>
> UPDATE
>  meta_table
> SET
>  distinct_hashes = (
>  SELECT
>  COUNT(hash) AS num
>  FROM
>  temp.t_distinct_hashes_by_id d
>  WHERE
>  hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id
> sub where meta_table.my_id != sub.my_id and hash not NULL)
>  AND
>  -- This one has to be at the bottom for some reason.
>  meta_table.my_id = d.my_id
>  )
> ;
>
> The query plan for this UPDATE includes two CORRELATED Subqueries, which
> the docs say are reassessed on every run - that seems like the problem.
> I get that it'd need to do that 20 times (once per my_id), but the
> slowdown seems considerably longer than that needs to warrant.
>
> 000SCAN TABLE meta_table
> 000EXECUTE CORRELATED SCALAR SUBQUERY 0
> 000SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING
> INDEX idx__2 (my_id=?)
> 000EXECUTE CORRELATED LIST SUBQUERY 1
> 100SCAN TABLE t_distinct_hashes_by_id AS sub
>
> Does anyone have any thoughts on how I can speed this up (SQLite 3.15.0
> and confirmed in 3.24.0 (which uses about 3 times the disk IO / sec for
> the same query/data))?
> Thanks,
> Jonathan
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list

Re: [sqlite] LIMIT

2018-06-23 Thread Barry
I checked with a fake dataset: the GROUP BY solution is quicker with no
index on F2 (the query planner can understand that query well enough to
create an automatic covering index). However, with an index on F2, the
rowid solution is considerably faster for a dataset with 2600 random
distinct F2 and one million total rows in T.

The test script is at the end of the email. On my computer, with the rowid
correlation, the query returned in 14 seconds*. With the GROUP BY solution,
the query took 50 seconds.

It was interesting to see that both queries ran quickly spitting out values
until SQLite had returned all possible values, then stalled for a long time
as the DB engine processed all rows that could never return anything.

Cheers,

 - Barry

[*I suspected that most of the time with the rowid query was taken in
writing to the console, so I SELECT'd COUNT(*) instead of * and it returned
in 6 seconds, and I suspect still did most of the work (the query plan was
the same). I tried the same thing on the GROUP BY query, but the GROUP BY
and COUNT don't play so well together and it still spat out a bunch of
numbers to the console.]

CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT NOT NULL);

CREATE TABLE Names(id INTEGER PRIMARY KEY, Name TEXT NOT NULL);

INSERT INTO Names(Name) VALUES
('Alex'),('Brett'),('Cindy'),('Dennis'),('Echo'),('Frank'),('Garry'),('Herbert'),('Indigo'),('Jack'),('Karl'),('Lima'),('Melon'),('Nunes'),('Oprah'),('Peter'),('Quincy'),('Robert'),('Sarah'),('Tangerine'),('Unicorn'),('Violet'),('Wilfred'),('Violet'),('Wesley'),('Xavier'),('Yeltzin'),('Zbrudov');

WITH num(i) AS (SELECT 1 UNION ALL SELECT i + 1 FROM num WHERE i < 100)
INSERT INTO Names(Name)
SELECT Name || i FROM Names, Num;

WITH num(i, rnd) AS (SELECT 1, RANDOM()%1300 + 1300 UNION ALL SELECT i + 1,
RANDOM()%1300 + 1300 FROM num WHERE i < 100)
INSERT INTO T(F2)
SELECT (SELECT Name FROM Names WHERE id = rnd) FROM num;

CREATE INDEX idx_F2 ON T(F2);

On Sun, 24 Jun 2018 at 07:00, Gert Van Assche  wrote:

> Ryan, my dataset isn't that big. 11K records.
> Your solution is noticeable faster 996 ms vs 13126 ms.
> Interesting!
>
> gert
>
> Op za 23 jun. 2018 om 18:09 schreef R Smith :
>
> >
> > On 2018/06/23 2:47 PM, Gert Van Assche wrote:
> > > Barry, that's even easier indeed. And it works perfect!
> > > Thanks for sharing this.
> >
> > I very much like the method which Barry suggested for the simplicity,
> > but have avoided it since I thought it would be significantly slower on
> > a large data set, however, SQLite is known for some pretty neat
> > optimizations (especially where the row_id is concerned) and I'm now
> > interested to know if indeed such optimizations feature in this case. I
> > have a few such implemented queries that might benefit from changing
> over.
> >
> > May I ask, if you do have a rather large dataset, and perhaps don't mind
> > the time, would you care to compare the two methods and let us know if
> > there is any discernible difference in speed? Also one small important
> > item - how many F1 items are there typically per unique F2 item in
> > total? Does it grow over time?
> >
> > Also, you do not need hashes, for either of the two queries - the
> > queries should work regardless of integer or Unicode text used, perhaps
> > the comparison might need a specific collation? Surrounding both
> > references with a TRIM() function might also help. Essentially, if F1 of
> > record 3 in Table T is '*&^%@#*&^@#' then   "... FROM T AS A, T AS B
> > WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug.
> > The <, <=, >, >= might all produce some Unicode weirdness upon
> > inadequate collations.
> >
> >
> > If time doesn't allow, then don't spend effort on this, it's simply a
> > curiosity. :)
> >
> > Thanks!
> > Ryan
> >
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread Gert Van Assche
Ryan, my dataset isn't that big. 11K records.
Your solution is noticeable faster 996 ms vs 13126 ms.
Interesting!

gert

Op za 23 jun. 2018 om 18:09 schreef R Smith :

>
> On 2018/06/23 2:47 PM, Gert Van Assche wrote:
> > Barry, that's even easier indeed. And it works perfect!
> > Thanks for sharing this.
>
> I very much like the method which Barry suggested for the simplicity,
> but have avoided it since I thought it would be significantly slower on
> a large data set, however, SQLite is known for some pretty neat
> optimizations (especially where the row_id is concerned) and I'm now
> interested to know if indeed such optimizations feature in this case. I
> have a few such implemented queries that might benefit from changing over.
>
> May I ask, if you do have a rather large dataset, and perhaps don't mind
> the time, would you care to compare the two methods and let us know if
> there is any discernible difference in speed? Also one small important
> item - how many F1 items are there typically per unique F2 item in
> total? Does it grow over time?
>
> Also, you do not need hashes, for either of the two queries - the
> queries should work regardless of integer or Unicode text used, perhaps
> the comparison might need a specific collation? Surrounding both
> references with a TRIM() function might also help. Essentially, if F1 of
> record 3 in Table T is '*&^%@#*&^@#' then   "... FROM T AS A, T AS B
> WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug.
> The <, <=, >, >= might all produce some Unicode weirdness upon
> inadequate collations.
>
>
> If time doesn't allow, then don't spend effort on this, it's simply a
> curiosity. :)
>
> Thanks!
> Ryan
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread Simon Slavin
On 23 Jun 2018, at 5:08pm, R Smith  wrote:

> May I ask, if you do have a rather large dataset, and perhaps don't mind the 
> time, would you care to compare the two methods and let us know if there is 
> any discernible difference in speed?

Answers will be different depending on type of main storage, size of cache 
available, and how much of the computer's processors are devoted to SQLite.  I 
used to have an overnight run (not using SQLite) on an early dual-processor 
machine which took three times longer to work if a tiny clock was showing on 
the display.  The clock hogged one of the processors.

It would be an interesting comparison, but it's valid only for the setup it's 
run on.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread R Smith


On 2018/06/23 2:47 PM, Gert Van Assche wrote:

Barry, that's even easier indeed. And it works perfect!
Thanks for sharing this.


I very much like the method which Barry suggested for the simplicity, 
but have avoided it since I thought it would be significantly slower on 
a large data set, however, SQLite is known for some pretty neat 
optimizations (especially where the row_id is concerned) and I'm now 
interested to know if indeed such optimizations feature in this case. I 
have a few such implemented queries that might benefit from changing over.


May I ask, if you do have a rather large dataset, and perhaps don't mind 
the time, would you care to compare the two methods and let us know if 
there is any discernible difference in speed? Also one small important 
item - how many F1 items are there typically per unique F2 item in 
total? Does it grow over time?


Also, you do not need hashes, for either of the two queries - the 
queries should work regardless of integer or Unicode text used, perhaps 
the comparison might need a specific collation? Surrounding both 
references with a TRIM() function might also help. Essentially, if F1 of 
record 3 in Table T is '*&^%@#*&^@#' then   "... FROM T AS A, T AS B 
WHERE A.F1 = B.F1" must resolve correctly. If it doesn't, it's a bug.  
The <, <=, >, >= might all produce some Unicode weirdness upon 
inadequate collations.



If time doesn't allow, then don't spend effort on this, it's simply a 
curiosity. :)


Thanks!
Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SELECT becomes very slow when converted to UPDATE

2018-06-23 Thread Jonathan Moules

Hi List,
I'm trying to find all hashes that are unique to a specific id (my_id), 
and then use a UPDATE-Join to update another table with that number.


After much tweaking, I've simplified the table down to a basic temp 
table (actually created using a CREATE AS SELECT ... GROUP BY my_id, hash):


CREATE TEMP TABLE t_distinct_hashes_by_id (
    my_id            INTEGER,
    hash            TEXT
);

And indexes in both directions because I'm still trying to optimise (and 
see what SQLite wants):


CREATE UNIQUE INDEX temp.idx__1 ON t_distinct_hashes_by_id (
    hash,
 my_id
);

CREATE UNIQUE INDEX temp.idx__2 ON t_distinct_hashes_by_id (
    my_id,
    hash
);

There are only 20 values for my_id, but several hundred thousand hashes.

-

I can do a SELECT query which gets what I want and runs in about 0.5 
seconds:


SELECT
   *
        FROM
            temp.t_distinct_hashes_by_id d
        WHERE
            hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id 
sub where 1 != sub.my_id and hash not NULL)

            AND
            1 = d.my_id

The EXPLAIN:
0    0    0    SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING 
INDEX idx__2 (my_id=?)

0    0    0    EXECUTE LIST SUBQUERY 1
1    0    0    SCAN TABLE t_distinct_hashes_by_id AS sub

-
So in theory, I'd hope that an UPDATE version using this select should 
take around 20 * 0.5 sec = 10 seconds. But it's actually taking... well 
I don't know how long, at least 10 minutes before I gave up waiting. 
This is the UPDATE:


UPDATE
    meta_table
SET
    distinct_hashes = (
        SELECT
            COUNT(hash) AS num
        FROM
            temp.t_distinct_hashes_by_id d
        WHERE
            hash NOT IN (SELECT hash FROM temp.t_distinct_hashes_by_id 
sub where meta_table.my_id != sub.my_id and hash not NULL)

            AND
            -- This one has to be at the bottom for some reason.
            meta_table.my_id = d.my_id
    )
;

The query plan for this UPDATE includes two CORRELATED Subqueries, which 
the docs say are reassessed on every run - that seems like the problem. 
I get that it'd need to do that 20 times (once per my_id), but the 
slowdown seems considerably longer than that needs to warrant.


0    0    0    SCAN TABLE meta_table
0    0    0    EXECUTE CORRELATED SCALAR SUBQUERY 0
0    0    0    SEARCH TABLE t_distinct_hashes_by_id AS l USING COVERING 
INDEX idx__2 (my_id=?)

0    0    0    EXECUTE CORRELATED LIST SUBQUERY 1
1    0    0    SCAN TABLE t_distinct_hashes_by_id AS sub

Does anyone have any thoughts on how I can speed this up (SQLite 3.15.0 
and confirmed in 3.24.0 (which uses about 3 times the disk IO / sec for 
the same query/data))?

Thanks,
Jonathan



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread Gert Van Assche
Barry, that's even easier indeed. And it works perfect!
Thanks for sharing this.

gert

Op za 23 jun. 2018 om 14:32 schreef Barry Smith :

> Ryan's way works well. Here is a second method which expresses it in a
> different way:
>
> SELECT * FROM T t1 where rowid IN (SELECT rowid FROM T t2 WHERE t1.F2 =
> t2.F2 ORDER BY rowid LIMIT 10)
>
> If you have WITHOUT ROWID tables you'd have to replace rowid with your
> primary key.
>
> (The query may still work without the ORDER BY, I didn't test it, but even
> if it does a future query optimiser might break that because without the
> order by the results of the inner select are free to include a different 10
> rowids for every value in the outer query)
>
> > On 23 Jun 2018, at 9:50 pm, Gert Van Assche  wrote:
> >
> > Hi Ryan, thanks for this. This is working if the F1 field is a numeric
> > value. With text (especially Asian & Arabic characters) this does not
> seem
> > to work.
> > So I created an MD5 hash from the text fields and it works great! Thank
> you
> > so much.
> >
> > gert
> >
> > Op vr 22 jun. 2018 om 22:52 schreef R Smith :
> >
> >>
> >>> On 2018/06/22 10:04 PM, Gert Van Assche wrote:
> >>>  All,
> >>>
> >>> I'm sure it must be possible, I just don't find how.
> >>> I have a table T with 2 fields (F1 and F2). The F1 are unique and the
> F2
> >>> are not unique.
> >>> I would like to get only 10 F1 fields for each unique F2.
> >>
> >> This is not normally done, and windowing functions in other RDBMSes
> >> makes for an easier way, but it can still be done in SQLite with some
> >> creative grouping of a self-joined query.
> >> In this example, I limited it to 3 F1 items per unique F2 for brevity,
> >> but you can easily change the "< 4" to "< 11" or "<= 10" according to
> >> preference.
> >>
> >>   -- SQLite version 3.20.1  [ Release: 2017-08-24 ]  on SQLitespeed
> >> version 2.0.2.4.
> >>   --
> >>
> >>
> 
> >>
> >> CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT);
> >>
> >> INSERT INTO T(F1,F2) VALUES
> >>  (1,  'John')
> >> ,(2,  'John')
> >> ,(3,  'Jason')
> >> ,(4,  'John')
> >> ,(5,  'Jason')
> >> ,(6,  'John')
> >> ,(7,  'John')
> >> ,(8,  'Jason')
> >> ,(9,  'Jason')
> >> ,(10,  'Joan')
> >> ,(11,  'Joan')
> >> ,(12,  'Joan')
> >> ,(13,  'Jimmy')
> >> ;
> >>
> >> SELECT A.F2, B.F1
> >>   FROM T AS A
> >>   JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1
> >>  GROUP BY A.F2, B.F1
> >>  HAVING COUNT(*) < 4
> >> ;
> >>
> >>   -- F2|  F1
> >>   -- - | ---
> >>   -- Jason |  3
> >>   -- Jason |  5
> >>   -- Jason |  8
> >>   -- Jimmy |  13
> >>   -- Joan  |  10
> >>   -- Joan  |  11
> >>   -- Joan  |  12
> >>   -- John  |  1
> >>   -- John  |  2
> >>   -- John  |  4
> >>
> >>
> >>
> >> -- Another option to note, in case the 10 limit is not important and
> >> simply aimed
> >> -- at saving space, is to use group concatenation, like so:
> >>
> >> SELECT F2, group_concat(F1)AS F1
> >>   FROM T
> >>  GROUP BY F2
> >> ;
> >>
> >>   --   |
> >>   -- F2|F1
> >>   -- - | -
> >>   -- Jason |  3,5,8,9
> >>   -- Jimmy | 13
> >>   -- Joan  |  10,11,12
> >>   -- John  | 1,2,4,6,7
> >>
> >>
> >>   --
> >>
> >>
> 
> >>
> >> Cheers!
> >> Ryan
> >>
> >>
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread Barry Smith
Ryan's way works well. Here is a second method which expresses it in a 
different way:

SELECT * FROM T t1 where rowid IN (SELECT rowid FROM T t2 WHERE t1.F2 = t2.F2 
ORDER BY rowid LIMIT 10)

If you have WITHOUT ROWID tables you'd have to replace rowid with your primary 
key.

(The query may still work without the ORDER BY, I didn't test it, but even if 
it does a future query optimiser might break that because without the order by 
the results of the inner select are free to include a different 10 rowids for 
every value in the outer query)

> On 23 Jun 2018, at 9:50 pm, Gert Van Assche  wrote:
> 
> Hi Ryan, thanks for this. This is working if the F1 field is a numeric
> value. With text (especially Asian & Arabic characters) this does not seem
> to work.
> So I created an MD5 hash from the text fields and it works great! Thank you
> so much.
> 
> gert
> 
> Op vr 22 jun. 2018 om 22:52 schreef R Smith :
> 
>> 
>>> On 2018/06/22 10:04 PM, Gert Van Assche wrote:
>>>  All,
>>> 
>>> I'm sure it must be possible, I just don't find how.
>>> I have a table T with 2 fields (F1 and F2). The F1 are unique and the F2
>>> are not unique.
>>> I would like to get only 10 F1 fields for each unique F2.
>> 
>> This is not normally done, and windowing functions in other RDBMSes
>> makes for an easier way, but it can still be done in SQLite with some
>> creative grouping of a self-joined query.
>> In this example, I limited it to 3 F1 items per unique F2 for brevity,
>> but you can easily change the "< 4" to "< 11" or "<= 10" according to
>> preference.
>> 
>>   -- SQLite version 3.20.1  [ Release: 2017-08-24 ]  on SQLitespeed
>> version 2.0.2.4.
>>   --
>> 
>> 
>> 
>> CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT);
>> 
>> INSERT INTO T(F1,F2) VALUES
>>  (1,  'John')
>> ,(2,  'John')
>> ,(3,  'Jason')
>> ,(4,  'John')
>> ,(5,  'Jason')
>> ,(6,  'John')
>> ,(7,  'John')
>> ,(8,  'Jason')
>> ,(9,  'Jason')
>> ,(10,  'Joan')
>> ,(11,  'Joan')
>> ,(12,  'Joan')
>> ,(13,  'Jimmy')
>> ;
>> 
>> SELECT A.F2, B.F1
>>   FROM T AS A
>>   JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1
>>  GROUP BY A.F2, B.F1
>>  HAVING COUNT(*) < 4
>> ;
>> 
>>   -- F2|  F1
>>   -- - | ---
>>   -- Jason |  3
>>   -- Jason |  5
>>   -- Jason |  8
>>   -- Jimmy |  13
>>   -- Joan  |  10
>>   -- Joan  |  11
>>   -- Joan  |  12
>>   -- John  |  1
>>   -- John  |  2
>>   -- John  |  4
>> 
>> 
>> 
>> -- Another option to note, in case the 10 limit is not important and
>> simply aimed
>> -- at saving space, is to use group concatenation, like so:
>> 
>> SELECT F2, group_concat(F1)AS F1
>>   FROM T
>>  GROUP BY F2
>> ;
>> 
>>   --   |
>>   -- F2|F1
>>   -- - | -
>>   -- Jason |  3,5,8,9
>>   -- Jimmy | 13
>>   -- Joan  |  10,11,12
>>   -- John  | 1,2,4,6,7
>> 
>> 
>>   --
>> 
>> 
>> 
>> Cheers!
>> Ryan
>> 
>> 
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIMIT

2018-06-23 Thread Gert Van Assche
Hi Ryan, thanks for this. This is working if the F1 field is a numeric
value. With text (especially Asian & Arabic characters) this does not seem
to work.
So I created an MD5 hash from the text fields and it works great! Thank you
so much.

gert

Op vr 22 jun. 2018 om 22:52 schreef R Smith :

>
> On 2018/06/22 10:04 PM, Gert Van Assche wrote:
> >   All,
> >
> > I'm sure it must be possible, I just don't find how.
> > I have a table T with 2 fields (F1 and F2). The F1 are unique and the F2
> > are not unique.
> > I would like to get only 10 F1 fields for each unique F2.
>
> This is not normally done, and windowing functions in other RDBMSes
> makes for an easier way, but it can still be done in SQLite with some
> creative grouping of a self-joined query.
> In this example, I limited it to 3 F1 items per unique F2 for brevity,
> but you can easily change the "< 4" to "< 11" or "<= 10" according to
> preference.
>
>-- SQLite version 3.20.1  [ Release: 2017-08-24 ]  on SQLitespeed
> version 2.0.2.4.
>--
>
> 
>
> CREATE TABLE T(F1 INTEGER PRIMARY KEY, F2 TEXT);
>
> INSERT INTO T(F1,F2) VALUES
>   (1,  'John')
> ,(2,  'John')
> ,(3,  'Jason')
> ,(4,  'John')
> ,(5,  'Jason')
> ,(6,  'John')
> ,(7,  'John')
> ,(8,  'Jason')
> ,(9,  'Jason')
> ,(10,  'Joan')
> ,(11,  'Joan')
> ,(12,  'Joan')
> ,(13,  'Jimmy')
> ;
>
> SELECT A.F2, B.F1
>FROM T AS A
>JOIN T AS B ON B.F2 = A.F2 AND B.F1 >= A.F1
>   GROUP BY A.F2, B.F1
>   HAVING COUNT(*) < 4
> ;
>
>-- F2|  F1
>-- - | ---
>-- Jason |  3
>-- Jason |  5
>-- Jason |  8
>-- Jimmy |  13
>-- Joan  |  10
>-- Joan  |  11
>-- Joan  |  12
>-- John  |  1
>-- John  |  2
>-- John  |  4
>
>
>
> -- Another option to note, in case the 10 limit is not important and
> simply aimed
> -- at saving space, is to use group concatenation, like so:
>
> SELECT F2, group_concat(F1)AS F1
>FROM T
>   GROUP BY F2
> ;
>
>--   |
>-- F2|F1
>-- - | -
>-- Jason |  3,5,8,9
>-- Jimmy | 13
>-- Joan  |  10,11,12
>-- John  | 1,2,4,6,7
>
>
>--
>
> 
>
> Cheers!
> Ryan
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users