About the design and encoding of diacritics involving cedillas and 
commaaccents:

[Note that remarks about language use are limited to a European context.]

These glyphs are sometimes called /*cedilla/, but this is due to an 
historical misinterpretation in both the Unicode standard and the original 
version of the Adobe Glyph List:

/Gcedilla/gcedilla/
/Kcedilla/kcedilla/
/Lcedilla/lcedilla/
/Ncedilla/ncedilla/
/Rcedilla/rcedilla/

These glyphs are used in a European context only for Latvian, and the 
correct form of diacritic is *not* a cedilla but the same unattached 
'commaaccent' form used for Romanian S and T. [Note, however that you 
should not use the /comma/ glyph as a component below any of these letters: 
it is much too large. You want a shorter, typically curved form, occupying 
about the same height as the cedilla. The mark should be centred optically 
below the letter.]

So these glyphs should actually be

/Gcommaaccent/gcommaaccent/
/Kcommaaccent/kcommaaccent/
/Lcommaaccent/lcommaaccent/
/Ncommaaccent/ncommaaccent/
/Rcommaaccent/rcommaaccent/

but mapped to the ...WITH CEDILLA Unicode characters.

NB: the lowercase /gcommaaccent/ is almost always written with a variant 
mark that actually sits above the letter (to avoid collision with the 
descending loop); this is achieved by rotating the commaaccent mark 180 
degrees and positioning it above the g. I usually include the variant 
ingredient glyph /uni0312/ to use in the /gcommaaccent/ composite.


Regarding the /Scedilla/ and /Tcedilla/ vs. /Scommaaccent/ and /Tcommaaccent/:

/Scedilla/scedilla/ are used only for Turkish; this must be a true cedilla.

/Scommaaccent/scommaaccent/ and /Tcommaaccent/tcommaaccent/ are used only 
for Romanian; this must be the same 'comma' diacritic form discussed above 
for Latvian, and should *not* be attached to the letter.

/Tcedilla/tcedilla/ is not used for any European language (it is arguably 
more appropriate for Gagauz Turkish than the 'comma' accent form, because 
they also use the /Scedilla/, but GT texts I have seen all use the 'comma' 
below the T and the cedilla below the S). Generally I do not include the 
cedilla variant in fonts, and simply double map the /Tcommaaccent/ to the 
Unicode values discussed below.

Version 3.0 of the Unicode standard, which postdates the published WGL4 
set, disunified the /Scedilla/ and /Tcedilla/ from the /Scommaaccent/ and 
/Tcommaaccent/ by providing new codepoints for the latter. My 
recommendation is to use the new codepoints for /Scommaaccent/ but to 
double map the /Tcommaccent/ glyph to the new codepoints and also to the 
old /Tcedilla/ codepoint.

Note that there are text encoding issues regarding Romanian, because the 
Romanian 8-bit codepages all use the old /Scedilla/ and /Tcedilla/ Unicode 
codepoints, not the new codepoints for the 'comma' accent characters. In 
OpenType fonts, we've addressed this (for future support) by including a 
Language System tag for Romanian, and a Localised Forms <locl> feature 
lookup to substitute the /Scommaccent/ glyph for the /Scedilla/. This 
feature is not yet supported in any systems or applications, but I'm 
reasonably certain that it will be.

John Hudson

Tiro Typeworks          www.tiro.com
Vancouver, BC           [EMAIL PROTECTED]

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it.              - Terry Eagleton


Reply via email to