On Thu, Oct 17, 2019 at 01:46:26PM +0000, Sam Whited wrote:
> TL;DR — we should avoid using XEP-0372 until "TODO: define character
> appropriately" is removed and resolved.

XEP-0394 (Message Markup) works similarly to XEP-0372 and defines the
"start" and "end" values in "units of unicode code points in the
character data of the body element".

This seems better than bytes because then you'll never have an offset in the
middle of a UTF-8 encoding.

You might still have an offset in between two codepoints that should ideally be
shown together like "EU" making the EU flag, but this seems less of an issue to
me.

I therefore think we should just do the same for XEP-0372. It would in any case
be crazy to specify one way of doing things in XEP-0394 and another in XEP-0372.

JC

 
> On Thu, Oct 17, 2019, at 10:07, JC Brand wrote:
> > Instead, I propose that we use XEP-0372 references to indicate that
> > a particular shortname (e.g. :dancingpanda:) should be replaced with
> > an image.
> >
> > For example:
> >
> >  <message type="chat" from="t...@chat.org" to="m...@chat.org" <body>I
> >  feel like dancing! :dancingpanda:</body> <reference
> >  xmlnx="urn:xmpp:reference:0" begin="21" end="35" type="data" uri="
> >  https://images.com/dancingpanda"/> </message>
> 
> We should avoid using references in the wild until a few things are
> cleared up. We don't want lots of pre-mature implementations popping up
> that aren't compatible with one another.
> 
> For example, in the following message:
> 
> "> ☃︎ :sadpanda:"
> 
> Should the start attribute for ":sadpanda:" be 4 or 5? Unicode snowman
> is 2 bytes, after all.
> 
> What about:
> 
> "🇪🇺 :sadpanda:"
> 
> Which may be rendering as an EU flag or as the separate letters 'E', 'U'
> depending on your rendering?
> 
> The easiest way is to probably just say that the offset is in bytes, but
> now what do we do if a buggy or malicious client sends something with
> the offset in the middle of the UTF-8 encoding for the snowman emoji?
> What about in the middle of the two codepoints that will be combined to
> create the EU flag glyph which would still be between valid UTF-8
> encodings?
> 
> This is not an easy problem, and while I don't want to tackle trying to
> solve it in this thread, I think references should be avoided until we
> do or we'll never get all the implementations doing one thing later (and
> emojis are exactly the kind of feature that will lead to lots of
> implementations).
> 
> —Sam
> 
> 
> -- 
> Sam Whited
> _______________________________________________
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: standards-unsubscr...@xmpp.org
> _______________________________________________

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Reply via email to