Luca Furini wrote:
note that a word with a soft hyphen in its middle would not be
hyphenated, unless we ignore this character when collecting word fragments
Well, in order to prepare for hyphenation, other characters
like joiners has to be removed too. We should probably also
use Unicode
Manuel Mall wrote:
What about character composition/decomposition?
Good question? Where is the answer?
Lets clarify the problem first. Let's say the input contains
the sequence U+0061 U+0308 (latin small a, combining diaresis),
the font has a glyph for U+00E4 but not U+0308. Obviously,