> John [Cowan]'s list is not "a few characters".

Let's take Latin, for starters. There are 1870 entries in the UCA for Latin.
If you subtract from John's list the ones that are already interleaved -- as
I did in my email -- then you get 78 values, or about 4%.

I'll repeat that list again below, since it seems to have missed notice.
Now, one could argue that the letters without uppercase pairs are only used
technically (e.g. in IPA), and thus should be excluded. If so, that leaves
us with 52 (26 upper+lower), or about 3%.

If we really wanted to minimize the number of changes, then we could exclude
the ones that are for languages that rarely occur in data. I did a quick
check on http://www.eki.ee/letter/, and put what I found below. This is
*not* a complete analysis, and would need to be extended to the other
scripts, but we would then be talking about 10 letters (5 upper+lower) or
0.5% with a very restrictive list, about double that if we included a few
more.

So, yes, I do think it will probably end up being a pretty small list.

Mark

=======
Capitals by language on http://www.eki.ee/letter/

da [Danish]; fo [Faroese]; kl [Greenlandic]; no [Norwegian];

 00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE
 01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND
ACUTE (no information, but included for consistency with O WITH STROKE)

bs [Bosnian]; hr [Croatian]; sami1 [Inari SÃmi]; sami2 [North SÃmi]; sami4
[Skolt SÃmi]; sl [Slovenian]; vi [Vietnamese];

 0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE

mt [Maltese];

 0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE

pl [Polish]; sorb1 [Lower Sorbian]; sorb2 [Upper Sorbian]; sla [Kashubian];

 0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE

sami2 [North SÃmi];

 0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE
 01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE

ha [Hausa]; ff [Fula]; or bm [Bambara];

 0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK
 018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK
 0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK
 01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK
 019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK

No Information

 0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK
 0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK
 0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK
 01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK
 01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK
 01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK
 0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK
 0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE
 01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE
 0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR
 018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR

 0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT
LEG
 019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH MIDDLE
TILDE
 01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX
HOOK

==============
List of items from John's list that are not already interleaved.

 0181; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH HOOK
 0182; 0042; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER B WITH TOPBAR
 0187; 0043; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER C WITH HOOK
 0110; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH STROKE
 018A; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH HOOK
 018B; 0044; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER D WITH TOPBAR
 0191; 0046; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER F WITH HOOK
 0193; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH HOOK
 01E4; 0047; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER G WITH STROKE
 0126; 0048; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER H WITH STROKE
 0197; 0049; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER I WITH STROKE
 0198; 004B; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER K WITH HOOK
 0141; 004C; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER L WITH STROKE
 019D; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LEFT HOOK
 0220; 004E; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER N WITH LONG RIGHT
LEG
 00D8; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE
 019F; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH
MIDDLETILDE
 01FE; 004F; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER O WITH STROKE AND
ACUTE
 01A4; 0050; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER P WITH HOOK
 0166; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH STROKE
 01AC; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH HOOK
 01AE; 0054; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER T WITH RETROFLEX
HOOK
 01B2; 0056; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER V WITH HOOK
 01B3; 0059; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Y WITH HOOK
 01B5; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH STROKE
 0224; 005A; !nfd+remove_marks; !uca #LATIN CAPITAL LETTER Z WITH HOOK

 1E9A; 0061; !nfd+remove_marks; !uca #LATIN SMALL LETTER A WITH RIGHT
HALFRING
 0180; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH STROKE
 0183; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH TOPBAR
 0253; 0062; !nfd+remove_marks; !uca #LATIN SMALL LETTER B WITH HOOK
 0188; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH HOOK
 0255; 0063; !nfd+remove_marks; !uca #LATIN SMALL LETTER C WITH CURL
 0111; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH STROKE
 018C; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TOPBAR
 0221; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH CURL
 0256; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH TAIL
 0257; 0064; !nfd+remove_marks; !uca #LATIN SMALL LETTER D WITH HOOK
 0192; 0066; !nfd+remove_marks; !uca #LATIN SMALL LETTER F WITH HOOK
 01E5; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH STROKE
 0260; 0067; !nfd+remove_marks; !uca #LATIN SMALL LETTER G WITH HOOK
 0127; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH STROKE
 0266; 0068; !nfd+remove_marks; !uca #LATIN SMALL LETTER H WITH HOOK
 0268; 0069; !nfd+remove_marks; !uca #LATIN SMALL LETTER I WITH STROKE
 029D; 006A; !nfd+remove_marks; !uca #LATIN SMALL LETTER J WITH CROSSED-TAIL
 0199; 006B; !nfd+remove_marks; !uca #LATIN SMALL LETTER K WITH HOOK
 0140; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE DOT
 0142; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH STROKE
 019A; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BAR
 0234; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH CURL
 026B; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH MIDDLE TILDE
 026C; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH BELT
 026D; 006C; !nfd+remove_marks; !uca #LATIN SMALL LETTER L WITH RETROFLEX
HOOK
 0271; 006D; !nfd+remove_marks; !uca #LATIN SMALL LETTER M WITH HOOK
 019E; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LONG
RIGHTLEG
 0235; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH CURL
 0272; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH LEFT HOOK
 0273; 006E; !nfd+remove_marks; !uca #LATIN SMALL LETTER N WITH RETROFLEX
HOOK
 00F8; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE
 01FF; 006F; !nfd+remove_marks; !uca #LATIN SMALL LETTER O WITH STROKE AND
ACUTE
 01A5; 0070; !nfd+remove_marks; !uca #LATIN SMALL LETTER P WITH HOOK
 02A0; 0071; !nfd+remove_marks; !uca #LATIN SMALL LETTER Q WITH HOOK
 027C; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH LONG LEG
 027D; 0072; !nfd+remove_marks; !uca #LATIN SMALL LETTER R WITH TAIL
 0282; 0073; !nfd+remove_marks; !uca #LATIN SMALL LETTER S WITH HOOK
 0167; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH STROKE
 01AB; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH PALATAL HOOK
 01AD; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH HOOK
 0236; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH CURL
 0288; 0074; !nfd+remove_marks; !uca #LATIN SMALL LETTER T WITH RETROFLEX
HOOK
 028B; 0076; !nfd+remove_marks; !uca #LATIN SMALL LETTER V WITH HOOK
 01B4; 0079; !nfd+remove_marks; !uca #LATIN SMALL LETTER Y WITH HOOK
 01B6; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH STROKE
 0225; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH HOOK
 0290; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH RETROFLEX
HOOK
 0291; 007A; !nfd+remove_marks; !uca #LATIN SMALL LETTER Z WITH CURL
 025A; 0259; !nfd+remove_marks; !uca #LATIN SMALL LETTER SCHWA WITH HOOK
 0286; 0283; !nfd+remove_marks; !uca #LATIN SMALL LETTER ESH WITH CURL
 01BA; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH TAIL
 0293; 0292; !nfd+remove_marks; !uca #LATIN SMALL LETTER EZH WITH CURL

âMark

----- Original Message ----- 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, July 10, 2004 04:20
Subject: Re: Changing UCA primary weights (bad idea)


> At 17:34 -0700 2004-07-09, Mark Davis wrote:
>
> >What I think we should be examining is which of the items that are not
> >interfiled (to use your phrasing) should be, if any. I don't think
> >everything should be. In particular, I think John's list is the list we
> >should be focusing on.
>
> I think most of what is in John [Cowan]'s list
> are letters which are quite properly not
> interfiled with "base" letters. The African hook
> letters (which I have mentioned many times, and
> which you have ignored in favour of the Danish
> letters you are more familiar with) are there.
>
> >  > John's list?
> >
> >That's was in my original mail, that you were commenting on when you
changed
> >the subject line, but which you didn't apparently didn't bother to
actually
> >read.
>
> Sweet of you to say.
>
> >  > My point is made here. It is really only in
> >>  initial position where this is likely to be
> >>  noticed.
> >
> >This is incorrect. It will make a difference in other positions. Sorting
> >"SÃren" after "Sozar" in a long list, if someone isn't expecting it, will
> >cause problems. They look for it after "Soret", don't see it on the page,
> >and assume it isn't there; fooled by the fact that it is on a completely
> >different page.
>
> No way! Do you expect your default tailorable
> template to suddenly and magically relieve the
> user of the problems of long lists and multi-page
> typesetting? Sheesh. No matter how much you
> jiggle either the template or a tailoring for
> people who only know the letters A-Z, there will
> be edge cases which will fail this kind of test.
>
> >Remember that the collation sequence is also used for language-sensitive
> >matching as well as sorting.
>
> I remember.
>
> >  > What I want is the status quo, however.
> >>  Leave the template and its principles alone.
> >
> >Stability is important, and we want to consider that very carefully
before
> >making any change. However, I believe that the current way we handle a
few
> >characters in UCA is distinctly suboptimal, and worth considering.
>
> John [Cowan]'s list is not "a few characters".
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>
>


Reply via email to