Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Marvin W Mon, 07 Dec 2020 07:25:09 -0800

Hi,

On 04.12.20 21:23, Florian Schmaus wrote:
> And I am in favor of code points because it allows us to aim for the
>  extended grapheme cluster algorithm, while also allowing for the 
> "simply count code points" fallback.


XEP-0426 already discusses why it's using codepoints instead of
grapheme clusters in its rationale:

> The most obvious way of counting characters is to count them how 
> humans would. This sounds easy when only having western scripts in 
> mind but becomes more complicated in other scripts and most 
> importantly is not well-defined across Unicode versions. New unicode 
> versions regularly added new possibilities to build grapheme 
> clusters, including from existing code points. To be forward 
> compatible, counting grapheme clusters, graphemes, glyphs or similar 
> is thus not an option.

Also I forgot to mention that grapheme clusters are locale specific
(example: "ch" is considered a single grapheme cluster in slowak). The
TR#29 even says:

> The Unicode definitions of grapheme clusters are defaults: not meant
> to exclude the use of more sophisticated definitions of tailored
> grapheme clusters where appropriate.

Finally, I don't think that it's generally inappropriate to point inside
a grapheme cluster (even if that's hard to implement). An example of
where it seems appropriate to reference a part of a grapheme cluster is
this: https://larma.de/grapheme.html

Marvin
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Reply via email to