> FWIW I was a big proponent of doing it this way too, but I've changed my > mind after seeing too many grapheme segmentation implementations be > broken in small, different, ways. My new position is that we have to > just count bytes and figure out a sane behavior in case someone sends us > an invalid offset in the middle of a codepoint or something. This is > encoding agnostic (not that it matters for XMPP) and makes it very easy > to count: go to that byte offset, check if we're on any sort of UTF-8 > boundary, if so call it a day, if not do whatever the fallback is.
Codepoints are preferable: https://mail.jabber.org/pipermail/standards/2019-October/036589.html If you're indexing by clusters then you're just asking for trouble.
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
