Many grapheme extenders are not "combining characters". Combining characters are classified this way for legacy reasons (the very weak "general category" property) and this property is normatively stabilized. As well most combining characters have a non-zero combining class and they are stabilized for the purpose of normalization.
Grapheme extenders include characters that are also NOT combining characters but controls (e.g. joiners). Some graphemclusters are also more complex in some scripts (there are extenders encoded BEFORE the base character; and they cannot be classified as combining characters because combining characters are always encoded AFTER a base character) For legacy reasons (and roundtrip compatibility with older standards) not all scripts are encoded using the UCS character model using combining characters. (E.g. the Thai script; not following the "logical" encoding order; but following the model used in TIS-620 and other standards based on it; including for Windows, and *nix/*nux). 2014-02-20 11:42 GMT+01:00 Mathias Bynens <[email protected]>: > What is the difference between 'combining characters' ( > http://www.unicode.org/faq/char_combmark.html) and 'grapheme extenders' ( > http://www.unicode.org/reports/tr44/#Grapheme_Extend) in Unicode? > > They seem to do the same thing, as far as I can tell - although the set of > grapheme extenders is larger than the set of combining characters. I'm > clearly missing something here. Why the distinction? > > I've also posted this question on Stack Overflow: > http://stackoverflow.com/q/21722729/96656 > _______________________________________________ > Unicode mailing list > [email protected] > http://unicode.org/mailman/listinfo/unicode >
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

