On 8/2/2010 5:04 PM, Karl Pentzlin wrote:
I have compiled a draft proposal:
Proposal to add Variation Sequences for Latin and Cyrillic letters
The draft can be downloaded at:
 http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB).
The final proposal is intended to be submitted for the next UTC
starting next Monday (August 9).

Any comments are welcome.

- Karl Pentzlin

This is an interesting proposal to deal with the glyph selection problem caused by the unification process inherent in character encoding.

When Unicode was first contemplated, the web did not exist and the expectation was that it would nearly always be possible to specify the font to be used for a given text and that selecting a font would give the correct glyph.

As the proposal noted, universal fonts and viewing documents on other platforms and systems across the web have made this solution unattractive for general texts.

We are left then with these five scenarios

1) Free variation
2) Orthographic variation of isolated characters (by language, e.g. different capitals) 3) Orthographic variation of entire texts (e.g. italic Cyrillic forms, by language)
4) Orthographic variation by type style (e.g. Fraktur conventions)
5) Notational conventions (e.g. IPA)

For free variation of a glyph, the only possible solutions are either font selection or use of a variation sequence. I concur with Karl, that in this case, where notable variations have been unified, that adding variation selectors is a much more viable means of controlling authorial intent than font selection.

If text is language tagged, then Opentype mechanisms exist in principle to handle scenario 2 and 3. For full texts in a certain language, using variation selectors throughout is unappealing as a solution.

However, it may be a viable solution for being able to embed correctly rendered citations in other text, given that language tagging can be separated from the document and that automatic language tagging may detect large chunks of text, but not short runs.

The Fraktur problem is one where one typestyle requires additional information (e.g. when to select long s) that is not required for rendering the same text in another typestyle. If it is indeed desirable (and possible) to create a correctly encoded string that can be rendered without further change automatically in both typestyles, then adding any necessary variation sequences to ensure that ability might be useful. However, that needs to be addressed in the context of a precise specification of how to encode texts so that they are dual renderable. Only addressing some isolated variation sequences makes no sense.

Notational conventions are addressed in Unicode by duplicate encoding (IPA) or by variation sequences. The scheme has holes, in that it is not possible in a few cases to select one of the variants explicitly, instead, the ambiguous form has to be used, in the hope that a font is used that will have the proper variant in place for the ambiguous form.

Adding a few variation sequences (like the one to allow the "a" at 0061 to be the two story one needed for IPA) would fill the gap for times when controlling the precise display font is not available.

However, there's no need to add variation sequences to select an *ambiguous* form. Those sequences should be removed from the proposal.

Overall a valuable starting point for a necessary discussion.

A./

Reply via email to