Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

Asmus Freytag Wed, 04 Aug 2010 11:16:46 -0700

On 8/2/2010 5:04 PM, Karl Pentzlin wrote:

I have compiled a draft proposal:
Proposal to add Variation Sequences for Latin and Cyrillic letters
The draft can be downloaded at:
 http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB).
The final proposal is intended to be submitted for the next UTC
starting next Monday (August 9).


Any comments are welcome.

- Karl Pentzlin

This is an interesting proposal to deal with the glyph selection problemcaused by the unification process inherent in character encoding.

When Unicode was first contemplated, the web did not exist and theexpectation was that it would nearly always be possible to specify thefont to be used for a given text and that selecting a font would givethe correct glyph.

As the proposal noted, universal fonts and viewing documents on otherplatforms and systems across the web have made this solutionunattractive for general texts.


We are left then with these five scenarios

1) Free variation

2) Orthographic variation of isolated characters (by language, e.g.different capitals)3) Orthographic variation of entire texts (e.g. italic Cyrillic forms,by language)

4) Orthographic variation by type style (e.g. Fraktur conventions)
5) Notational conventions (e.g. IPA)

For free variation of a glyph, the only possible solutions are eitherfont selection or use of a variation sequence. I concur with Karl, thatin this case, where notable variations have been unified, that addingvariation selectors is a much more viable means of controlling authorialintent than font selection.

If text is language tagged, then Opentype mechanisms exist in principleto handle scenario 2 and 3. For full texts in a certain language, usingvariation selectors throughout is unappealing as a solution.

However, it may be a viable solution for being able to embed correctlyrendered citations in other text, given that language tagging can beseparated from the document and that automatic language tagging maydetect large chunks of text, but not short runs.

The Fraktur problem is one where one typestyle requires additionalinformation (e.g. when to select long s) that is not required forrendering the same text in another typestyle. If it is indeed desirable(and possible) to create a correctly encoded string that can be renderedwithout further change automatically in both typestyles, then adding anynecessary variation sequences to ensure that ability might be useful.However, that needs to be addressed in the context of a precisespecification of how to encode texts so that they are dual renderable.Only addressing some isolated variation sequences makes no sense.

Notational conventions are addressed in Unicode by duplicate encoding(IPA) or by variation sequences. The scheme has holes, in that it is notpossible in a few cases to select one of the variants explicitly,instead, the ambiguous form has to be used, in the hope that a font isused that will have the proper variant in place for the ambiguous form.

Adding a few variation sequences (like the one to allow the "a" at 0061to be the two story one needed for IPA) would fill the gap for timeswhen controlling the precise display font is not available.

However, there's no need to add variation sequences to select an*ambiguous* form. Those sequences should be removed from the proposal.


Overall a valuable starting point for a necessary discussion.

A./

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

Reply via email to