On Thu, Oct 17, 2019 at 01:46:26PM +0000, Sam Whited wrote: > TL;DR — we should avoid using XEP-0372 until "TODO: define character > appropriately" is removed and resolved.
XEP-0394 (Message Markup) works similarly to XEP-0372 and defines the "start" and "end" values in "units of unicode code points in the character data of the body element". This seems better than bytes because then you'll never have an offset in the middle of a UTF-8 encoding. You might still have an offset in between two codepoints that should ideally be shown together like "EU" making the EU flag, but this seems less of an issue to me. I therefore think we should just do the same for XEP-0372. It would in any case be crazy to specify one way of doing things in XEP-0394 and another in XEP-0372. JC > On Thu, Oct 17, 2019, at 10:07, JC Brand wrote: > > Instead, I propose that we use XEP-0372 references to indicate that > > a particular shortname (e.g. :dancingpanda:) should be replaced with > > an image. > > > > For example: > > > > <message type="chat" from="t...@chat.org" to="m...@chat.org" <body>I > > feel like dancing! :dancingpanda:</body> <reference > > xmlnx="urn:xmpp:reference:0" begin="21" end="35" type="data" uri=" > > https://images.com/dancingpanda"/> </message> > > We should avoid using references in the wild until a few things are > cleared up. We don't want lots of pre-mature implementations popping up > that aren't compatible with one another. > > For example, in the following message: > > "> ☃︎ :sadpanda:" > > Should the start attribute for ":sadpanda:" be 4 or 5? Unicode snowman > is 2 bytes, after all. > > What about: > > "🇪🇺 :sadpanda:" > > Which may be rendering as an EU flag or as the separate letters 'E', 'U' > depending on your rendering? > > The easiest way is to probably just say that the offset is in bytes, but > now what do we do if a buggy or malicious client sends something with > the offset in the middle of the UTF-8 encoding for the snowman emoji? > What about in the middle of the two codepoints that will be combined to > create the EU flag glyph which would still be between valid UTF-8 > encodings? > > This is not an easy problem, and while I don't want to tackle trying to > solve it in this thread, I think references should be avoided until we > do or we'll never get all the implementations doing one thing later (and > emojis are exactly the kind of feature that will lead to lots of > implementations). > > —Sam > > > -- > Sam Whited > _______________________________________________ > Standards mailing list > Info: https://mail.jabber.org/mailman/listinfo/standards > Unsubscribe: standards-unsubscr...@xmpp.org > _______________________________________________
Description: PGP signature
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________