----- Original Message ----- From: "Peter Kirk" <[EMAIL PROTECTED]> To: "Philippe Verdy" <[EMAIL PROTECTED]> Cc: "Unicode Mailing List" <[EMAIL PROTECTED]> Sent: Wednesday, March 17, 2004 8:11 PM Subject: Re: Investigating: LATIN CAPITAL LETTER J WITH DOT ABOVE
> On 17/03/2004 09:59, Philippe Verdy wrote: > > >Arcane Jill <[EMAIL PROTECTED]> wrote: > > > > > >>But if you lowercased that, surely you'd get <j, combining dot above>. > >>How should that be rendered? > >> > >> > > > >This is already addressed: lowercase j is "soft-dotted" meaning that its default > >dot disappears when there's a diacritic above it, and this includes the > >combining dot above. > > > >So <j, combining dot above> is not canonically or compatibility equivalent to > ><j>, but both normally look the same when rendered, and the difference that is > >invisible in lowercase, comes back to visible when converted back to uppercase. > >So the semantic is preserved... > > But if you had a font (e.g. a Celtic one) in which lower case i or j is > dotless, should the soft-dottedness be cancelled and the dot appeared > anyway? (Dare I suggest that this would give a way of writing Turkish > with a Celtic font? Probably not as it would mean non-standard encoding > of the Turkish text.) In my opinion yes, a sequence <lower case i or j, combining dot above> should show the dot even in the Celtic font. The "soft-dotted" property only implies the appearance of the implicit dot associated with <lower case i or j>, but has no effect on the following <combining dot above> which is explicitly requesting the presence of the dot. So a Celtic font may very well be used to show Turkish text, at the price of a change of encoding, something that would probably not happen. So if the standard Turkish text is rendered with the Cletic font, it will not be rendered correctly, as the Celtic font will display both the soft-dotted <lowercase i or j> and <lowercase dotless i or j> exactly the same way, unless the renderer is instructed that the text to render is Turkic, and the Celtic font contains instructions to restore the implicit dot for <lowercase i or j> for Turkic text. The font may for example (1) recognize the language tags in the text stream, if present, or (2) it may contain language-specific character-to-glyph substitution tables, that a language-aware renderer would be able to use if instructed to do so by the application using this renderer and instructing the renderer with a language code option. A priori I prefer option (2), as language tags in the text stream is already a deprecated method, that requires inserting additional characters in the plain-text stream to render, and also because the language information is most often encoded out of the band, for example by a xml:lang attribute of a container XML element whose content is a text-element (each text-element in XML is the largest unit of plain-text coded in a XML document, XML itself not being plain-text by itself but a encoding syntax for general structured data).

