Re: Merging combining classes

Jim Allan Thu, 06 Nov 2003 19:23:25 -0800

Ant�nio Martins-Tv�lkin wrote:

Anyway -- who ever decided that cedilla and undercomma are different things? Do they have different origins? Any language / orthography using both distinctly?...

I don't know whether undercomma is in origin distinct from cedilla or is historically an adaptation of the cedilla. I *suspect* the latter.

Even given a common origins, it is debatable whether they should now be considered the same or not. That is why there is a problem. It isn't cut and dried.

The MARC 21 and Ansel character sets distinguished the two as CEDILLA and LEFT HOOK (for the undercomma) though it is dubious whether the originators of these sets knew what this "left hook" was. See http://lcweb2.loc.gov/cocoon/codetables/45.html for current ANSEL specifications and http://www.niso.org/standards/resources/Z39-47-1993(R2002).pdf for 1963 table where it was notoriously given the name "LEFT HOOF".

Its identity with the undercomma is asserted at http://www.niso.org/international/SC4/Wg1_240.pdf:

<< 5/2 HOOK TO LEFT In ISO 5426, this character is annotated ' used in Latvian, Romanian.' Because of this use, the most appropriate mapping is to U+0326 COMBINING COMMA BELOW (annotated as 'variant of the following' [combining cedilla] in the Unicode Standard). >>

The original ISO 6429 character sets were constructed under the philosophy that differences between cedilla and undercomma were only stylistic. The default images in those tables and in Unicode Standard versions 1 and 2 showed a cedilla form throughout.

However users of Latvian and Romanian insisted firmly that cedilla forms were not historically correct for printed material in those languages. It was *only* increasing use of fonts created outside of eastern Europe that had caused the incorrect cedilla shape to be seen, especially as computer technology took hold.

For Latvian (and Livonian), the problem was easily solved within standard character sets by font designers using the undercomma character beneath all letters except _c_ or _s_ .

However Romanian _s_ which traditionally had undercomma conflicted with Turkish _s_ with cedilla.

The result was a Romanian proposal to add uppercase and lowercase combined characters with undercomma for uppercase and lowercase _s_ and _t_.

See ISO/IEC JTC 1/SC 2/WG 2 N1604 (1987) at http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/n1604.htm :

<<
*RESOLUTION M33.24 (4 Latin characters):

_Netherland Negative._*

WG 2 accepts the following four Latin characters (requested by Romania), their names and shapes to be encoded in the BMP as follows:

0218 LATIN CAPITAL LETTER S WITH COMMA BELOW

0219 LATIN SMALL LETTER S WITH COMMA BELOW

021A LATIN CAPITAL LETTER T WITH COMMA BELOW

021B LATIN SMALL LETTER T WITH COMMA BELOW

in accordance with document N1361.

See resolution M33.26 for further processing.
>>

But Romanians are still frustrated because most fonts distributed as part of computer operating systems or otherwise available do not support these characters.

ISO 8859/16 (intended as a replacement for ISO 8859/2) specifically designates undercomma rather than cedilla with _s_, _S_, _t_, _T_. See ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT

For the Netherlands opposition see http://wwwold.dkuug.dk/JTC1/SC2/WG3/docs/n441.pdf .

Since there is no linguistic tradition in any language for _t_ with a cedilla shape beneath, most modern fonts display an undercomma beneath U+0162, U+0163 instead of a cedilla shape.

It is really only with _s_ that there are two conflicting usages.

There are actually three conflicting uses, since Gagauz traditionally uses a cedilla shape under _c_ an undercomma beneath _t_ and a symbol halfway between the two under _s_. See http://www.unicode.org/mail-arch/unicode-ml/y2002-m09/0199.html

Jim Allan

Re: Merging combining classes

Reply via email to