Hi, On 04.12.20 21:23, Florian Schmaus wrote: > And I am in favor of code points because it allows us to aim for the > extended grapheme cluster algorithm, while also allowing for the > "simply count code points" fallback.
XEP-0426 already discusses why it's using codepoints instead of grapheme clusters in its rationale: > The most obvious way of counting characters is to count them how > humans would. This sounds easy when only having western scripts in > mind but becomes more complicated in other scripts and most > importantly is not well-defined across Unicode versions. New unicode > versions regularly added new possibilities to build grapheme > clusters, including from existing code points. To be forward > compatible, counting grapheme clusters, graphemes, glyphs or similar > is thus not an option. Also I forgot to mention that grapheme clusters are locale specific (example: "ch" is considered a single grapheme cluster in slowak). The TR#29 even says: > The Unicode definitions of grapheme clusters are defaults: not meant > to exclude the use of more sophisticated definitions of tailored > grapheme clusters where appropriate. Finally, I don't think that it's generally inappropriate to point inside a grapheme cluster (even if that's hard to implement). An example of where it seems appropriate to reference a part of a grapheme cluster is this: https://larma.de/grapheme.html Marvin _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
