Re: [sqlite] GROUP BY and ICU collation

Keith Medcalf Thu, 07 Feb 2019 22:36:49 -0800

On Thursday, 7 February, 2019 22:47, li...@herger.net wrote:

>> COLLATE affects SORTING, it does not transmorgify the "value" of
>the thing to which it is applied.  That is, name COLLATE xxxx means
>that the item name is SORTED using the collating sequence xxxx, not
>that the result of "name COLLATE xxxx" is transmorgified into tha
>value that is used for sorting.


>Understood. But wouldn't a GROUP BY sort the data internally in order
>to be able to group records? Or would you not at least expect it to
>follow the same rules grouping as when sorting? In sorting it seems to
>consider Š "the same" as S, but it doesn't in grouping. I'm not too concerned
>about the representation.

But you are not using the same "expression" for selecting, sorting, and 
grouping.  That is, you need to specify:

  SELECT expression, count(distinct id)
    FROM artists
GROUP BY expression
ORDER BY expression;

where expression is the expression that you want to use

  SELECT substr(name collate de_DE, 1, 1), count(distinct id)
    FROM artists
GROUP BY substr(name collate de_DE, 1, 1)
ORDER BY substr(name collate de_DE, 1, 1);

If you do not do so then you cannot expect the "GROUP BY" to be using the same 
ordering as the "ORDER BY" nor the select to be returning the same value that 
was used to do the grouping and sorting.

(This substitutes the collation NOCASE for the de_DE since I no have a de_DE 
collation:

sqlite> select substr(name collate nocase, 1, 1), count(distinct id)
   ...>   FROM artists
   ...> group by substr(name collate nocase, 1, 1)
   ...> order by substr(name collate nocase, 1, 1);
QUERY PLAN
|--SCAN TABLE artists (~1048576 rows)
`--USE TEMP B-TREE FOR GROUP BY
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     57    0                    00  Start at 57
1     Noop           1     4     0                    00
2     SorterOpen     3     3     0     k(1,NOCASE)    00
3     Integer        0     5     0                    00  r[5]=0; clear abort 
flag
4     Null           0     8     8                    00  r[8..8]=NULL
5     Gosub          7     52    0                    00
6     OpenRead       0     3     0     2              00  root=3 iDb=0; artists
7     ColumnsUsed    0     0     0     3              00
8     Explain        8     0     0     SCAN TABLE artists (~1048576 rows)  00
9     Noop           0     0     0                    00  Begin WHERE-loop0: 
artists
10    Rewind         0     20    0                    00
11      Noop           0     0     0                    00  Begin WHERE-core
12      Column         0     1     13                   00  r[13]=artists.name
13      Function0      6     13    10    substr(3)      03  
r[10]=func(r[13..15])
14      Column         0     1     11                   00  r[11]=artists.name
15      Column         0     0     12                   00  r[12]=artists.id
16      MakeRecord     10    3     16                   00  
r[16]=mkrec(r[10..12])
17      SorterInsert   3     16    0                    00  key=r[16]
18      Noop           0     0     0                    00  End WHERE-core
19    Next           0     11    0                    01
20    Noop           0     0     0                    00  End WHERE-loop0: 
artists
21    OpenPseudo     4     16    3                    00  3 columns in r[16]
22    SorterSort     3     56    0                    00  GROUP BY sort
23      SorterData     3     16    4                    00  r[16]=data
24      Column         4     0     9                    00  r[9]=
25      Compare        8     9     1     k(1,NOCASE)    00  r[8] <-> r[9]
26      Jump           27    31    27                   00
27      Move           9     8     1                    00  r[8]=r[9]
28      Gosub          6     44    0                    00  output one row
29      IfPos          5     56    0                    00  if r[5]>0 then 
r[5]-=0, goto 56; check abort flag
30      Gosub          7     52    0                    00  reset accumulator
31      Column         4     2     17                   00  r[17]=
32      Found          2     36    17    1              00  key=r[17]
33      MakeRecord     17    1     18                   00  r[18]=mkrec(r[17])
34      IdxInsert      2     18    17    1              10  key=r[18]
35      AggStep        0     17    2     count(1)       01  accum=r[2] 
step(r[17])
36      If             4     38    0                    00
37      Column         4     1     1                    00  r[1]=
38      Integer        1     4     0                    00  r[4]=1; indicate 
data in accumulator
39    SorterNext     3     23    0                    00
40    Gosub          6     44    0                    00  output final row
41    Goto           0     56    0                    00
42    Integer        1     5     0                    00  r[5]=1; set abort flag
43    Return         6     0     0                    00
44    IfPos          4     46    0                    00  if r[4]>0 then 
r[4]-=0, goto 46; Groupby result generator entry point
45    Return         6     0     0                    00
46    AggFinal       2     1     0     count(1)       00  accum=r[2] N=1
47    Copy           1     21    0                    00  r[21]=r[1]
48    Function0      6     21    19    substr(3)      03  r[19]=func(r[21..23])
49    Copy           2     20    0                    00  r[20]=r[2]
50    ResultRow      19    2     0                    00  output=r[19..20]
51    Return         6     0     0                    00  end groupby result 
generator
52    Null           0     1     3                    00  r[1..3]=NULL
53    OpenEphemeral  2     0     0     k(1,B)         00  nColumn=0
54    Integer        0     4     0                    00  r[4]=0; indicate 
accumulator empty
55    Return         7     0     0                    00
56    Halt           0     0     0                    00
57    Transaction    0     0     2     0              01  usesStmtJournal=0
58    Integer        1     14    0                    00  r[14]=1
59    Integer        1     15    0                    00  r[15]=1
60    Integer        1     22    0                    00  r[22]=1
61    Integer        1     23    0                    00  r[23]=1
62    Goto           0     1     0                    00
sqlite>

>
>Michael
>
>
>>
>> That is
>>
>> select name collate nocase, count(distinct id) from x group by name
>collate nocase order by name collate nocase
>>
>> whill produce cased output not the value that was used for the
>sorting.
>>
>>
>> select lower(name collate nocase), count(distinct id) from x group
>by name collate nocase order by name collate nocase;
>>
>> to transmorgificate name into a "caseless" representation.  So you
>would need to do something like this:
>>
>> select de_DE(substr(name collate de_DE,1,1)), count(distinct id)
>from artists
>> group by substr(name collate de_DE,1,1)
>> order by by substr(name collate de_DE,1,1)
>>
>> and the function de_DE would have to transmorgificate its value to
>the result you want to see.
>>
>> ---
>> The fact that there's a Highway to Hell but only a Stairway to
>Heaven says a lot about anticipated traffic volume.
>>
>>> -----Original Message-----
>>> From: sqlite-users [mailto:sqlite-users-
>>> boun...@mailinglists.sqlite.org] On Behalf Of li...@herger.net
>>> Sent: Thursday, 7 February, 2019 05:12
>>> To: sqlite-users@mailinglists.sqlite.org
>>> Subject: [sqlite] GROUP BY and ICU collation
>>>
>>>>> Hi there,
>>>>>
>>>>> I'm trying to create a list with an index list. Eg. I have
>>> artists:
>>>>>
>>>>> Sting
>>>>> Šuma Čovjek
>>>>> Suzanne Vega
>>>>>
>>>>> That's the sort order I'd get using an ICU collation. "Šuma
>>> Čovjek"
>>>>> would be sorted as "Suma..." as expected.
>>>>>
>>>>> Now I'd like to create an index bar by providing groups of the
>>> first
>>>>> character:
>>>>>
>>>>> SELECT SUBSTR(name,1,1), count(distinct id) FROM artists GROUP
>BY
>>>>> SUBSTR(name,1,1) ORDER BY name COLLATE de_DE>>
>>>> Aren't you missing a COLLATE clause after the GROUP BY term?
>>>>
>>>>       ... GROUP BY SUBSTR(name,1,1) COLLATE de_DE ...
>>>
>>> TBH: I didn't even know about this. I thought the COLLATE at the
>end
>>> of
>>> the statement would do it for all.
>>>
>>> Alas, tried again to no avail. No matter whether I add it after
>the
>>> GROUP BY or not, the result is the same.
>>>
>>> I should probably have added some version information: I'm using
>the
>>> Perl DBD::SQLite 1.58 (sqlite 3.22.0). I didn't see any mention of
>>> related changes in the changelog for SQLite. What would be the
>>> easiest
>>> (and most reliable) way to try to reproduce this without Perl? Is
>>> there
>>> a HowTo use collations with the CLI sqlite?
>>>
>>> --
>>>
>>> Michael

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] GROUP BY and ICU collation

Reply via email to