2011/8/24 John Hudson <[email protected]>: > Philippe, I'll need to think about this some more and try to get a better > grasp of what you're suggesting. But some immediate thoughts come to mind: > > If BiDi is to be applied to shaped glyph strings, surely that means needing > to step backwards through the processing that arrived at those shaped glyph > strings in order to correctly identify their relationship to underlying > character codes, since it is the characters, not the glyphs, that have > directional properties. There's nothing in an OT font that says e.g. GID 456 > /lam_alif.fina/ is an RTL glyph, so the directionality has to be processed > at the character level and mapped up through the GSUB features to the > glyphs.
No backward stepping is needed: process the text using grapheme cluster boundaries as a minimum unit of processing: apply normalization, try to cmap all their characters from the same font (use fallback fonts if needed), then if this fails try to cmap their individual character components to find a font match. This done, each character is now mapped to a definitive font and a putative (incompletely resolved) glyph id in that font. Note that PUAs will be isolated at this point (they form their own grapheme cluster). You can then check if the font provides an override for the BC property, from the default strong LTR value. Then independantly: - you can process the list of glyphs one by one, trying to match all applicable GSUB's only if they occur on the same font as the font associated with the previous character. You can also easily select the typographic variants of that font, for a single glyph. - you can update the current Bidi level of the character, using the BC property value overrides specified in the font containing the PUA, or the normative value for non-PUA, otherwise the default BC property value for PUA. If finally the remaining glyph id's are no longer substitutable, you can then apply GPOS rules (or legacy tables for base-to-base kerning) reliably, because you also know if the BiDi level is even (LTR) or odd (RTL). You can then consider the glyph metrics to accumulate widths in order to detect if an automatic line-break can occur. When a forced or automatic linebreak does occur, you can then adjust the justification of glyph ids. Because you also know at that point what is the directionality of all characters (including the first glyph of the line, and if this line starts a paragraph, from which you have determined what is the main direction of the baseline). You can also automatically adjust the widths of kashidas (or even automatically insert them for microjustification of glyphs, according to the joining properties of the associated characters). Then you can reorder the glyph ids that are in runs opposed to the main direction of the baseline for the paragraph. Some more refinements are needed for handling some text decorations (such as underlines which is not necessarily continuous in all styles and may need to avoid cutting through strokes; but this would require some metrics from the font, associated to glyphs with descenders). All the above can be done in parallel (i.e. character per character, each one being handled glyph id by glyph id, as long as there are matchable GSUB or GPOS). The memory requirement is limited to as many glyphs that can fit in the margin of a single line; Finally the line can be fully drawn with the reordered glyphs (you may need to clip the kashidas to their autojustified width, to avoid them to overlap too far away the surrounding joined characters).

