On 27/11/13 12:46, Khaled Hosny wrote:
On Wed, Nov 27, 2013 at 09:10:02PM +0900, Simon Cozens wrote:
This is possibly a daft question, but...
In traditional TeX, character tokens are processed and put into boxes
individually, with fairly primitive ligature tables. Obviously XeTeX doesn't
do this, using Harfbuzz (or ICU or whatever) to do the shaping and layout.
My question is, if you're not "showing" individual characters to the shaping
engine for it to consider, what defines how big a string of characters to
shape at a time? Does XeTeX break at the "word" level and then shape a word,
and if so what defines a word? (Chinese has no word breaks!) Or does it
shape an entire paragraph of text at a time (!) and then box up the glyphs
individually? Or...?
XeTeX shapes words one at a time, a word is basically any consecutive
sequence of character nodes (using the same font) after TeX has done its
macro expansion and is ready to typeset the material. The AAT code,
additionally, tries to merge word sequences separated by spaces into one
node.
In particular, in case it's not sufficiently clear from the above, note
that <space>s, being glue nodes, are NOT part of such a "consecutive
sequence of character nodes". And therefore a known limitation of xetex
is that OpenType lookups that try to match the <space> glyph will not
work. Shaping happens only within a run of non-space characters in a
given font.
Most fonts are not affected by this, but it is an issue for certain
fonts that want to do complex multi-word ligatures, or contextual forms
that depend on the adjacent <space> glyph.
JK
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex