On 12/4/20 3:03 PM, Andrew Nenakhov wrote:
Upping a year-old email thread for Florian.

Thanks, but I am well aware of the thread and the situation.

I think this below mixes aspects the XML layer with the Unicode layer, which do not have to get mixed when counting "characters". Ultimately what you get out of the textual representation of the <body/> element is a sequence of grapheme clusters (identified via extended grapheme clustering algorithm). Those are the entities that eventually should get counted.

Reply containing rant about how unpractical grapheme cluster counting is in 3, 2, 1… :)

- Florian


ср, 18 дек. 2019 г. в 20:41, Marvin W <[email protected]>:

[inline]

On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
In the end we have settled for counting characters of escaped string, so

This sounds like a terrible idea. In encoded XML, ">", "&#x3E;", "&gt;"
and "<!CDATA[>]]>" are equivalent. I just tried it out and servers
indeed do convert all of those to their shortest well-formed variant
(which is "&gt;") so you cannot rely on their reference length at all.
Servers may at their discretion convert non-ascii characters to their
character reference form (starting with &#). I have seen this at least
once happening with emojis.

to draw *&&&* in a client we count it as string with a length of 15,
thus <bold> reference points to characters 0..14:
<reference xmlns="urn:xmpp:reference:0" begin="0" end="14"
type="markup"><bold /></reference>

Luckily for you, this looks pretty non-standard, so you don't have to
deal with your implementation being incompatible with others. Also as
soon as XEP-0372 becomes actually more stable, you are technically
standard non-compliant because there is no <bold /> element defined for
the namespace "urn:xmpp:reference:0". You are apparently mixing XEP-0372
and XEP-0394.

Also that's a weird counting there, usually I would expect end to point
to the position after the last referenced character - at least that's
what you do in most programming languages (e.g. "&amp;&amp;&amp;"[0:14]
will give you "&amp;&amp;&amp" without the last ";").
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________




Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Reply via email to