CDATA is purely XML level and doesn't carry any semantic meaning.
And yes, the normal compliant XML parser doesn't even bother to tell
you how the data was encoded in the byte stream.
You are seriously confusing layers here.

Fair enough that I shouldn't have used spaces as the example; you're right that it's invalid, and I simply grabbed it as the sample due to the JID escaping thing.

JID escaping has, however, been put forward as a method of escaping characters to get them across the wire. I think I am failing to get my point across clearly, so I will try one last time. What I've been trying to address is, for instance:

So, we are talking about way to escape characters in an XML stream. My
view is that all way to escape characters are good, especially when they
are defined in XML, cited in XMPP RFC and are simple to implement (and
implemented by all parsers I know).

Read that carefully. "All way to escape characters are good."

If we are viewing CDATA as 'one more way to escape characters,' then we need to think about the implications. Because I will /guarantee/ you that if we recommend CDATA as an escaping method, then someone will do a <![CDATA[john&[EMAIL PROTECTED]> in an <item/> value, or whatever.

My point is that we need to /define/ things like this, rather than leaving them vague. Or else someone WILL go, 'Oh, well, when I send down john&[EMAIL PROTECTED] it disconnects me with a stream error saying there's an unescaped character there. I'll just make sure anything with unescaped characters goes into a CDATA block.' And if they do that, it will be valid XML across the wire, too! It should not pop them off with a stream error, right?

If we proclaim that all JIDs must adhere to the current rules and the characters we've discussed as visually useful but invalid to send across the wire as part of a node (namely, & and ' and so on) must be escaped using JID escaping, that's *fine*. Deciding to explicitly say that JIDs cannot contain those characters except as represented in JID escaping is a *valid and viable solution* to my concern.

If, however, we want to just leave it vague and make CDATA 'one more way to escape characters,' then people will most likely make assumptions about how things interact. Based on past experience, I suspect at least some of those assumptions will be wrong. My point is that if we want to include CDATA, then we need to make it clear where CDATA is /not/ an appropriate solution for escaping.

I hope that makes my concern clearer, but I will leave it alone at this point; I have realized I am arguing this point utterly alone; it may mean that I am utterly failing to communicate my concern clearly, or that I am seeing a problem where one does not exist. I will hope that it is the latter and my concerns are just motivated by my personal generally squidgy feelings about hazily-defined edges to standards, rather than being an actual problem. :)

--
Rachel Blackman <[EMAIL PROTECTED]>
Trillian Messenger - http://www.trillianastra.com/


Reply via email to