On Mon, Oct 21, 2019, at 14:06, Jonathan Lennox wrote: > The right concept here is probably "grapheme clusters", as defined in > Unicode Standard Annex 29. ICU has support for this.
This was also my suggestion at a summit a few years ago. However, the downside here is that it significantly increases the footprint of the code (you have to use a library that supports segmentation and grapheme clusters, or write a fairly complicated algorithm yourself), requires a lot more knowledge to implement (getting started with Unicode if it's not your focus is a lot of new terms and confusing concepts that make it easy to make a mistake, even if you do have a good library to work with), and generally makes implementations harder to do. I go back and forth between using grapheme clusters and bytes personally, but all the options that have been laid out have their downsides. —Sam -- Sam Whited _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________