On 12/4/20 3:03 PM, Andrew Nenakhov wrote:
Upping a year-old email thread for Florian.
Thanks, but I am well aware of the thread and the situation.I think this below mixes aspects the XML layer with the Unicode layer, which do not have to get mixed when counting "characters". Ultimately what you get out of the textual representation of the <body/> element is a sequence of grapheme clusters (identified via extended grapheme clustering algorithm). Those are the entities that eventually should get counted.
Reply containing rant about how unpractical grapheme cluster counting is in 3, 2, 1… :)
- Florian
ср, 18 дек. 2019 г. в 20:41, Marvin W <[email protected]>:[inline] On 12/18/19 3:22 PM, Andrew Nenakhov wrote:In the end we have settled for counting characters of escaped string, soThis sounds like a terrible idea. In encoded XML, ">", ">", ">" and "<!CDATA[>]]>" are equivalent. I just tried it out and servers indeed do convert all of those to their shortest well-formed variant (which is ">") so you cannot rely on their reference length at all. Servers may at their discretion convert non-ascii characters to their character reference form (starting with &#). I have seen this at least once happening with emojis.to draw *&&&* in a client we count it as string with a length of 15, thus <bold> reference points to characters 0..14: <reference xmlns="urn:xmpp:reference:0" begin="0" end="14" type="markup"><bold /></reference>Luckily for you, this looks pretty non-standard, so you don't have to deal with your implementation being incompatible with others. Also as soon as XEP-0372 becomes actually more stable, you are technically standard non-compliant because there is no <bold /> element defined for the namespace "urn:xmpp:reference:0". You are apparently mixing XEP-0372 and XEP-0394. Also that's a weird counting there, usually I would expect end to point to the position after the last referenced character - at least that's what you do in most programming languages (e.g. "&&&"[0:14] will give you "&&&" without the last ";"). _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
