Re: Question about “Uppercase” in DerivedCoreProperties.txt

Mike FABIAN Sat, 08 Nov 2014 01:25:11 -0800

Philippe Verdy <[email protected]> さんはかきました:

> note that tolower() and toupper() can only work one 1-character level, it
> is not recommended for use for changing case of plain text.
>
> For correct handling of locales, to upper and toupper should be replaced by
> strtolower and strtoupper (or their aliases) which will be able to process
> character clusters and contextual casing rules needed for a language or
> orthographic style


Yes, thank you for explaining this.

But these details of upper and lower casing cannot be expressed in the
“i18n” file of glibc:

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n

For toupper and tolower, this file just has character -> character
mapping tables, for example the “tolower” table contains only

    (<U03A3>,<U03C3>)

(i.e. mapping Σ U+03A3 -> σ U+03C3, never to the final sigma ς
U+03C2).

More correct, detailed information about upper and lower case must come
from elsewhere, not from this “i18n” file in glibc.  Using only the
information from this “i18n” file, not even the Greek sigma can be
handled correctly.

Pravin and me want to update this “i18n” file to the latest
data from Unicode 7.0.0, doing it as correct as possible within
the limitations caused by this file and the ISO C standard.

-- 
Mike FABIAN <[email protected]>
☏ Office: +49-69-365051027, internal 8875027
睡眠不足はいい仕事の敵だ。
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Question about “Uppercase” in DerivedCoreProperties.txt

Reply via email to