On Saturday, October 19 2019, "Sam Whited" wrote to "standards@xmpp.org" saying:

> On Sat, Oct 19, 2019, at 04:57, JC Brand wrote:
> > You might still have an offset in between two codepoints that should
> > ideally be shown together like "EU" making the EU flag, but this seems
> > less of an issue to me.
> 
> I don't know if this is better or not, and I'm still not sure how best
> to handle it. If you end up with text in the middle of a UTF-8 encoding,
> at least that's clearly an error. If it's in between the two letters in
> a flag emoji, that's not necessarily an error and there are tons of
> different ways you could handle it, which seems much more complex.
> Does this break the flag emoji back into the letter glyphs that are
> shown if it doesn't form a flag? What if it's between something and a
> zero-width joiner that would join it to another glyph, does that split
> that and now you have a dangling joiner? From a code perspective does
> this mean that highlighting always has to integrate with the text
> rendering engine? This seems like a *major* downside to me, as it likely
> makes the code much more complicated, and we may or may not even have
> the ability to manipulate how the text rendering engine handles things.

The right concept here is probably "grapheme clusters", as defined in
Unicode Standard Annex 29.  ICU has support for this.

-- 
Jonathan Lennox
len...@cs.columbia.edu
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Reply via email to