On Mittwoch, 18. Dezember 2019 17:27:04 CET Jonas Schäfer wrote: > On Mittwoch, 18. Dezember 2019 16:40:42 CET Marvin W wrote: > > [inline] > > > > On 12/18/19 3:22 PM, Andrew Nenakhov wrote: > > > In the end we have settled for counting characters of escaped string, so > > > > This sounds like a terrible idea. In encoded XML, ">", ">", ">" > > and "<!CDATA[>]]>" are equivalent. I just tried it out and servers > > indeed do convert all of those to their shortest well-formed variant > > (which is ">") so you cannot rely on their reference length at all. > > Servers may at their discretion convert non-ascii characters to their > > character reference form (starting with &#). I have seen this at least > > once happening with emojis. > > I’m 100% with Marvin (and Ralph) here. Counting before escaping makes no > sense, because the character data of XML is codepoints after escaping, not > before on a theoretical level and for the reasons noted by Marvin on a > practical level.
Sorry, this statement was confusing. I was thinking on the *receiving* end, where before the escaping handling would mean to count the codepoint U+0026 (&) as five codepoints (since it would still be encoded as "&"). On the sending side, you most definitely want to count *before* escaping. kind regards, Jonas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
