On 23 Apr 2014, at 22:16, Mathias Bynens <[email protected]> wrote: > Let’s say I’m writing a program that strips combining characters and grapheme > extenders from an input string. > > For combining marks, I’m looking for any non-combining marks (e.g. `a`) > followed by one or more combining marks (e.g. `̃`), and then I remove > everything but the non-combining mark (e.g. leaving only `a`). Is this a > correct approach? > > What should the approach be for grapheme extenders? Should the program only > look for `Grapheme_Base` characters followed by `Grapheme_Extend` characters > (which includes the code points in `Other_Grapheme_Extend`)?
The email subject should have been “Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?” — sorry for the confusion. Does anyone know the answer? _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

