The mechanism proposed by John to handle ZWJ/ZWNJ makes the implicit assumption that those characters are transformed into glyphs (via the usual 'cmap' mechanism) and that this is the avenue to transfer the intent of those characters to the shaping code in the font (i.e. some kind of ligature lookup). I'd like to revisit that assumption.

The ZWJ/ZWNJ characters are formatting characters. Their function is definitely different from the function of the "regular" characters (such as "A"): they are a way to control the rendering of regular characters around them, and to express that control in plain text. The debate so far shows that there is no strong objection to that mechanism by itself.

In an environment richer than plain text, there is obviously the possibility that this control could be expressed by other means than characters. In the OpenType world, and in particular in the interface between the layout engine and the shaping code in fonts, we have more than plain text, or rather plain glyphs; we also have a description of which features should be applied to which glyphs. So instead of having glyphs that stand for ZWJ/ZWNJ, can we use these features?

In fact, we already do that every day. For example, an InDesign user can insert the two characters x and y, and apply a ligature feature (let's say 'dlig') to them. It seems to me that this is just what ZWJ is about. So InDesign could do the following given the character sequence x ZWJ y: map it the glyph sequence cmap(x) cmap(y), with 'dlig' applied on those two glyphs. This 'dlig' application takes precedence over one via UI, i.e. it happens regardles of whether the user requested 'dlig' explicitly. The ZWJ character is simply not mapped to the glyph stream, since the feature application does the job of ZWJ.

We can handle ZWNJ in the same way: the sequence x ZWNJ y is transformed to the glyph sequence cmap(x) cmap(y), with 'dlig' not applied on those two glyphs. This 'dlig' non-application takes precedence over one via UI, i.e. 'dlig' is not applied to these two glyphs regardless of whether the user requested 'dlig' explicitly.

[May be a better way of thinking about the precedence stuff is to think entirely in markup terms:
<ligatures-on> ... x ZWNJ y ... </ligatures-on> is transformed in the glyph stream <dlig> ... cmap(x) </dlig> <dlig> cmap(y) ... <dlig>, i.e. dlig is off on the pair x y; hold your objection that a feature is applied to a position rather than a range for a minute.]

With this approach, we gain two things. First, not having a "formatting" glyph for ZWJ is IMHO a huge conceptual win, even bigger than not having a "formatting" character ZWJ would be. Second, what John's proposal did not mention (or may be I missed it) is that it's not just the ligature features that have to deal with this glyph, it is all the features; compound that by all the formatting characters, and you will start to understand Paul's reaction.

It's interesting to note that this approach can be applied to other formatting characters as well. Either their intent can be achieved by the layout engine alone, without help of the font, in which case there is no need to show anything to the code in the font; no glyph and no feature are consequence of those characters. Or their intent needs help of the font, and the OpenType way to ask for this help is to apply (or not) features.

All that takes care of selecting a ligature, but it does not quite take care of selecting cursive forms. I can see how we could define 'dlig' to do that (or define a 'zwj' feature that invokes the ligature lookups plus some single substitution lookup), but I am not sure I am happy with that. In fact, I am not sure I am happy with that clause in Unicode.


Eric.

[About the features applied to ranges rather than positions: think about it and it should be obvious 8-) It does not make sense to apply a ligature at a position; what makes sense is to apply a ligature on range. Think about 1->n substitutions; whatever lookups apply to the source glyph should also apply to all the replacement glyphs - ranges again. I even believe that this approach is compatible with the current OpenType spec. More details on demand.]

Reply via email to