Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Jonas Schäfer Sat, 21 Dec 2019 10:24:51 -0800

On Mittwoch, 18. Dezember 2019 17:27:04 CET Jonas Schäfer wrote:
> On Mittwoch, 18. Dezember 2019 16:40:42 CET Marvin W wrote:
> > [inline]
> > 
> > On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
> > > In the end we have settled for counting characters of escaped string, so
> > 
> > This sounds like a terrible idea. In encoded XML, ">", "&#x3E;", "&gt;"
> > and "<!CDATA[>]]>" are equivalent. I just tried it out and servers
> > indeed do convert all of those to their shortest well-formed variant
> > (which is "&gt;") so you cannot rely on their reference length at all.
> > Servers may at their discretion convert non-ascii characters to their
> > character reference form (starting with &#). I have seen this at least
> > once happening with emojis.
> 
> I’m 100% with Marvin (and Ralph) here. Counting before escaping makes no
> sense, because the character data of XML is codepoints after escaping, not
> before on a theoretical level and for the reasons noted by Marvin on a
> practical level.


Sorry, this statement was confusing. I was thinking on the *receiving* end, 
where before the escaping handling would mean to count the codepoint U+0026 
(&) as five codepoints (since it would still be encoded as "&amp;").

On the sending side, you most definitely want to count *before* escaping.

kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Reply via email to