> John [Cowan]'s list is not "a few characters". Let's take Latin, for starters. There are 1870 entries in the UCA for Latin. If you subtract from John's list the ones that are already interleaved -- as I did in my email -- then you get 78 values, or about 4%.
I'll repeat that list again below, since it seems to have missed notice. Now, one could argue that the letters without uppercase pairs are only used technically (e.g. in IPA), and thus should be excluded. If so, that leaves us with 52 (26 upper+lower), or about 3%. If we really wanted to minimize the number of changes, then we could exclude the ones that are for languages that rarely occur in data. I did a quick check on http://www.eki.ee/letter/, and put what I found below. This is *not* a complete analysis, and would need to be extended to the other scripts, but we would then be talking about 10 letters (5 upper+lower) or 0.5% with a very restrictive list, about double that if we included a few more. So, yes, I do think it will probably end up being a pretty small list. Mark ======= Capitals by language on http://www.eki.ee/letter/ da [Danish]; fo [Faroese]; kl [Greenlandic]; no [Norwegian]; 00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE 01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND ACUTE (no information, but included for consistency with O WITH STROKE) bs [Bosnian]; hr [Croatian]; sami1 [Inari SÃmi]; sami2 [North SÃmi]; sami4 [Skolt SÃmi]; sl [Slovenian]; vi [Vietnamese]; 0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE mt [Maltese]; 0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE pl [Polish]; sorb1 [Lower Sorbian]; sorb2 [Upper Sorbian]; sla [Kashubian]; 0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE sami2 [North SÃmi]; 0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE 01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE ha [Hausa]; ff [Fula]; or bm [Bambara]; 0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK 018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK 0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK 01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK 019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK No Information 0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK 0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK 0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK 01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK 01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK 01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK 0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK 0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE 01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE 0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR 018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR 0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT LEG 019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH MIDDLE TILDE 01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX HOOK ============== List of items from John's list that are not already interleaved. 0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK 0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR 0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK 0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE 018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK 018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR 0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK 0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK 01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE 0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE 0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE 0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK 0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE 019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK 0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT LEG 00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE 019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH MIDDLETILDE 01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND ACUTE 01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK 0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE 01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK 01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX HOOK 01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK 01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK 01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE 0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK 1E9A; 0061; !nfd+remove_marks; !uca #LATIN SMALL LETTER A WITH RIGHT HALFRING 0180; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH STROKE 0183; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH TOPBAR 0253; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH HOOK 0188; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH HOOK 0255; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH CURL 0111; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH STROKE 018C; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TOPBAR 0221; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH CURL 0256; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TAIL 0257; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH HOOK 0192; 0066; !nfd+remove_marks; !uca #LATIN SMALL LETTER F WITH HOOK 01E5; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH STROKE 0260; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH HOOK 0127; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH STROKE 0266; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH HOOK 0268; 0069; !nfd+remove_marks; !uca #LATIN SMALL LETTER I WITH STROKE 029D; 006A; !nfd+remove_marks; !uca #LATIN SMALL LETTER J WITH CROSSED-TAIL 0199; 006B; !nfd+remove_marks; !uca #LATIN SMALL LETTER K WITH HOOK 0140; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE DOT 0142; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH STROKE 019A; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BAR 0234; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH CURL 026B; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE TILDE 026C; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BELT 026D; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH RETROFLEX HOOK 0271; 006D; !nfd+remove_marks; !uca #LATIN SMALL LETTER M WITH HOOK 019E; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LONG RIGHTLEG 0235; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH CURL 0272; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LEFT HOOK 0273; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH RETROFLEX HOOK 00F8; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE 01FF; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE AND ACUTE 01A5; 0070; !nfd+remove_marks; !uca #LATIN SMALL LETTER P WITH HOOK 02A0; 0071; !nfd+remove_marks; !uca #LATIN SMALL LETTER Q WITH HOOK 027C; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH LONG LEG 027D; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH TAIL 0282; 0073; !nfd+remove_marks; !uca #LATIN SMALL LETTER S WITH HOOK 0167; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH STROKE 01AB; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH PALATAL HOOK 01AD; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH HOOK 0236; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH CURL 0288; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH RETROFLEX HOOK 028B; 0076; !nfd+remove_marks; !uca #LATIN SMALL LETTER V WITH HOOK 01B4; 0079; !nfd+remove_marks; !uca #LATIN SMALL LETTER Y WITH HOOK 01B6; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH STROKE 0225; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH HOOK 0290; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH RETROFLEX HOOK 0291; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH CURL 025A; 0259; !nfd+remove_marks; !uca #LATIN SMALL LETTER SCHWA WITH HOOK 0286; 0283; !nfd+remove_marks; !uca #LATIN SMALL LETTER ESH WITH CURL 01BA; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH TAIL 0293; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH CURL âMark ----- Original Message ----- From: "Michael Everson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, July 10, 2004 04:20 Subject: Re: Changing UCA primary weights (bad idea) > At 17:34 -0700 2004-07-09, Mark Davis wrote: > > >What I think we should be examining is which of the items that are not > >interfiled (to use your phrasing) should be, if any. I don't think > >everything should be. In particular, I think John's list is the list we > >should be focusing on. > > I think most of what is in John [Cowan]'s list > are letters which are quite properly not > interfiled with "base" letters. The African hook > letters (which I have mentioned many times, and > which you have ignored in favour of the Danish > letters you are more familiar with) are there. > > > > John's list? > > > >That's was in my original mail, that you were commenting on when you changed > >the subject line, but which you didn't apparently didn't bother to actually > >read. > > Sweet of you to say. > > > > My point is made here. It is really only in > >> initial position where this is likely to be > >> noticed. > > > >This is incorrect. It will make a difference in other positions. Sorting > >"SÃren" after "Sozar" in a long list, if someone isn't expecting it, will > >cause problems. They look for it after "Soret", don't see it on the page, > >and assume it isn't there; fooled by the fact that it is on a completely > >different page. > > No way! Do you expect your default tailorable > template to suddenly and magically relieve the > user of the problems of long lists and multi-page > typesetting? Sheesh. No matter how much you > jiggle either the template or a tailoring for > people who only know the letters A-Z, there will > be edge cases which will fail this kind of test. > > >Remember that the collation sequence is also used for language-sensitive > >matching as well as sorting. > > I remember. > > > > What I want is the status quo, however. > >> Leave the template and its principles alone. > > > >Stability is important, and we want to consider that very carefully before > >making any change. However, I believe that the current way we handle a few > >characters in UCA is distinctly suboptimal, and worth considering. > > John [Cowan]'s list is not "a few characters". > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > > >