On Mon, Oct 21, 2019, at 14:06, Jonathan Lennox wrote:
> The right concept here is probably "grapheme clusters", as defined in
> Unicode Standard Annex 29.  ICU has support for this.

This was also my suggestion at a summit a few years ago. However, the
downside here is that it significantly increases the footprint of the
code (you have to use a library that supports segmentation and grapheme
clusters, or write a fairly complicated algorithm yourself), requires a
lot more knowledge to implement (getting started with Unicode if it's
not your focus is a lot of new terms and confusing concepts that make it
easy to make a mistake, even if you do have a good library to work
with), and generally makes implementations harder to do.

I go back and forth between using grapheme clusters and bytes
personally, but all the options that have been laid out have their
downsides.

—Sam

-- 
Sam Whited
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Reply via email to