[Standards] Namespace well-formed data.

Dave Cridland Thu, 23 Oct 2008 16:33:33 -0700

Let's quickly remind ourselves of the key issues. I apologise forthis message being so damn long, but for people who read all the way,there's some suggested RFC text as a prize.


1) Protecting Implementations in the current environment:

Whether or not we want this to happen, servers can currently, and do,handle XML for forwarding purposes without performing full namespacechecks on the data. In particular, they do not, always, check forundeclared namespaces, and this has a very variable effect on thereceiver.

In some cases, the receiver terminates the session - this has provenhighly undesirable. In other cases, the undeclared prefix is ignored.Finally, in principle, the stanza could be rejected somehow, althoughas Jack pointed out last night, this would require an error withoutincluding the original stanza if the receiver doesn't want toretransmit an undeclared prefix.

We need to decide on desirable behaviour when faced with XML which iswell-formed, but has undeclared namespaces present. More than onebehaviour may be acceptable.

My personal view is that stream termination is not really acceptable,but sometimes implementors have little choice, and that the nexteasiest to implement is to essentially ignore undeclared prefixes,treating them as if they were an unknown URI.



2) Why we might want to let servers forward "Bad XMLNS":

As stated above, the status quo is that some servers do forward XMLwithout checking that every prefix used is declared.

There are advantages in doing so, indeed, since an XML stream can besplit into stanzas, and the outer element of the stanza examined,without doing more than a relatively simple, non-destructive, lex ofa buffer.

This seems to give people some confusion, so forgive me while Iexplain this in detail.


In other words, a server starts off with a string:

"<message to='[EMAIL PROTECTED]' type='headline'><![CDATA[ A CDATA section, perhaps with < or > in]]><foo/></message>"

By iterating through the string, character by character (or, morelikely, octet by octet, since it's equivalent in UTF-8 for thesepurposes), and maintaining a very small amount of fixed stateirrespective of the "depth" of the XML, we can find the end of theelement.

No allocations are involved, no new buffers, and the original buffercan be left intact, for copying directly to the write buffer of theoutgoing stream. This is more or less exactly what Isode'simplementation does, and this, like Artur has said as I type this, isa key reason for good performance. (And if anyone wants to measureit, please feel free to do so and publish the results).

The ability to pass buffers around for copying is obviously moreefficient than creating DOMs and reserializing. The number ofallocations are significant to a long running application for two keyreasons - note that this applies equally to both relatively low levellanguages like C, and higher level languages such as Python or Java.

Firstly, memory allocations typically involve a lock, and do not sitwell with multithreaded applications. This is typically reduced bypools, and per-thread allocators, but is easily disrupted simply byhanding ownership of memory between threads.

Secondly, rapid allocation patterns cause memory fragmentation, whichincreases memory load, causing a substantial decrease over time ofperformance. (Bad memory fragmentation sometimes appears to be amemory leak, even though checking every allocation and deallocationwon't find one). It's fair to say that modern allocators have reducedthe effects somewhat, but no allocations are still considerablybetter than lots.

If servers were unconditionally mandated to check each and everyprefix in incoming streams, then servers would be forced to build anallocated lookup table for prefixes. This lookup table needs to beadjusted potentially on every element - with every opening tag, newprefixes can be defined - and these affect the tag they're declaredin, so we must handle cases such as:


"<foo:element bar:attr='false' xmlns:foo='l' xmlns:bar='k'>"

Which essentially nessecitates a three-phase parse, first finding thepossibly prefixed names and attributes, then gathering XML namespacedeclarations, then finally resolving each prefix. At the closing tag,of course, we need to locate every prefix declaration now leavingscope, and remove it from our lookup.


And what for?

Servers do not care about these for themselves, please note. If theydid, they're be doing this anyway - and for cases like rostermanipulation, and many other things, that's exactly what we'll do.Hence you can start off a stream with "<s:streamxmlns:s='http://etherx.jabber.org/streams' xmlns='jabber:client'xmlns:a='urn:ietf:params:xml:ns:xmpp-sasl' to='example.com'id='asd'>", and the server will note that you're using "s" as yourstream prefix, and "a" as you SASL prefix, and be quite happy for youto later to <a:auth/> in that same stream.

So the sole reason for doing this extra work is to protect clients.But, go back to issue 1, up there, for a second - some choices therecan eliminate the damage, too, leaving the clients perfectly wellable to protect themselves, and this is a desirable thing for alcient to do. So is this really needed?

I'd expect, incidentally, that some servers would always checknamespaces, for security reasons, and be chosen because of it - it'sa marketable feature. I'd also expect that many servers might performmore stringent checking on certain traffic - MUC springs to mind - toavoid being a DoS amplifier (We might even recommend or mandate thisin XEP-0045).



3) And what's this RFC 2119 thing, anyway?

Finally, a reminder, to folks that haven't read RFC 2119 - they'renot statements of opinion.

"MUST" = "If you do not follow this, there is serious damage tointeroperability.""SHOULD" = "If you do not follow this, then your implementation mayhave or cause problems in some cases. Consider the consequencescarefully.""MAY" = "It doesn't matter whether or not you choose to do this, butbe warned that someone else might choose either way too.""SHOULD NOT" = "Don't do this unless you know what you're doing andfully understand the consequences."

"MUST NOT" = "Don't do this and expect things to work."

So I'm not really convinced that, given the status quo, mandatingfull namespace checking on servers is really a MUST anyway. Mostclients cope perfectly well, after all. On that basis, this feelslike a SHOULD.

It's been discussed on the IETF lists before that "SHOULD" and"SHOULD NOT" ought to give guidance, so here is some, in xml2rfcform. I've reworded and adjusted PSA's original text, to tighten thetext a bit, provide suggested answers to the above (which I don'tclaim to be consensus), and to pander to my phrasing preferences.

<t>An XMPP entity MUST NOT generate data that is not XML-well-formed.An XMPP which receives data that is not XML-well-formed SHALL rejectit by terminating the stream over which the data was received with an<xml-not-well-formed/> stream error.</t>

<t>Any elements within the XML stream other than XML stanzas, such asTLS or SASL elements, MUST be namespace-well-formed, and XMPPentities SHALL reject XML streams which fail to comply by terminatingthe stream over which the data was received with an<not-well-formed/> stream error. XMPP entities MUST generatenamespace-well-formed stanzas.</t>

<t>It is known that deployed servers do not always exhaustively checkfor undeclared namespaces in particular, before forwarding a stanza.Therefore XMPP entities SHOULD NOT terminate a stream over which astanza has been received that is not namespace-well-formed, asotherwise there is a potential Denial of Service attack, see <xreftarget='sec-xmlns'/>. Servers SHOULD check that forwarded stanzas arenamespace-well-formed in order to protect clients from such data, asmany existing XML parsers generate unrecoverable errors in this caseand therefore some clients are forced into terminating the stream.</t>

<t>XMPP entities which receive stanzas which are notnamespace-well-formed SHOULD reject them with a stanza error<not-acceptable/>.</t>


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

[Standards] Namespace well-formed data.

Reply via email to