> FWIW I was a big proponent of doing it this way too, but I've changed my
> mind after seeing too many grapheme segmentation implementations be
> broken in small, different, ways. My new position is that we have to
> just count bytes and figure out a sane behavior in case someone sends us
> an invalid offset in the middle of a codepoint or something. This is
> encoding agnostic (not that it matters for XMPP) and makes it very easy
> to count: go to that byte offset, check if we're on any sort of UTF-8
> boundary, if so call it a day, if not do whatever the fallback is.

Codepoints are preferable: 
https://mail.jabber.org/pipermail/standards/2019-October/036589.html
If you're indexing by clusters then you're just asking for trouble.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Reply via email to