Re: [Standards] well-formedness

Dave Cridland Tue, 21 Oct 2008 08:03:53 -0700

On Tue Oct 21 00:53:35 2008, Waqas wrote:

The expat parser (as an example) in namespace-aware mode reports a
fatal error on undeclared prefixes. This was added in response tothisbug report:http://sourceforge.net/tracker/index.php?func=detail&aid=695401&group_id=10127&atid=110127
which references this section of XML Names:
http://www.w3.org/TR/REC-xml-names/#ns-qualnames

Which doesn't say anything about mandatory fatal errors.

If you're parsing a static document, it's quite reasonable togenerate a fatal error, but I don't think that's the right thing atall with an XML stream.

Ah yes, a namespace aware parser (expat) is indeed being used with
namespace awareness disabled...

Right - and then namespaces are handled, so the overall result isthat a namespace aware parser is used. If you're mandating that allXMPP implementations MUST use somebody else's parser, then I don'tknow quite what to say.

I looked at the Gajim sources, and using
'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
undeclared prefixes clearly does not conform with [XML-NAMES].
See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance

Nonsense.

"A processor MUST report violations of namespace well-formedness" -Gajim is doing so, signalling this condition using a specificnamespace URI, so it clearly *does* conform. You may argue that Ishould have used some special non-string object instead, if you like,and that how Gajim handles this signal - by treating it as theunknown namespace it (kind of) is - is sufficiently simple and neatas to warrant being maligned as a hack, but it's a damn sight betterthan terminating the connection.

Gajim does not conform to XML-NAMES. I reviewed the code, and it
appears to act correctly for most XML. But it does not act correctly
for prefixes on attributes

Not that it did when expat was used to handle the namespaces, either.Making it handle these properly would involve quite a bit morerewriting. (Possible and desirable rewriting, to be sure, but nothingto do with the issue at hand, sorry).

. And it does not have a single one of all
those required checks for non-conforming XML (except the undeclared

prefix check on tag names). XML-NAMES requires a number of checksfor

conformance, some of which are in
http://www.w3.org/TR/REC-xml-names/#Conformance while others are
sprinkled throughout the spec.

I'll accept that - I didn't make it check for multiple colons, etc,and I might well allow a redefinition of xml: and xmlns:, which'd beconfusing. I ought to fix these at some point.

Incidentally, by stating "except the undeclared prefix check", aren'tyou arguing that the code *is* following XML-NAMES in this regard?

Dave, I don't think you want to conform to XML-NAMES. I think you'd
prefer to sanitize the XML instead to make it conform to XML-NAMES.
One step closer to HTML ;)

The mechanism by which I happen to have chosen to report undeclarednamespaces is merely a convenient mechanism which happens to haveresult I desired with minimal programming. I happen to think the codeis less hacky than Expat's rather bizarre API, which has namespacehandling hacked on via character delimiters, especially given howGajim then used this API. (Either Expat looks up namespaces and thenleaves you a non-standard notation to parse, or else you parse thestandard notation and lookup namespaces yourself, in a more resilientmanner - not a hard choice, really).

What I'm trying to do is look at where we are now, and describe thebest option for developers wishing to deploy now, especially bearingin mind we need to obtain the best result, where "best" is in termsof interoperability and potential efficiency. If you disagree withthose goals, please say so - I don't think your goals are all thatdifferent.

You appear to be arguing that the best interoperability (presumably)is achieved by producing only XML-NAMES conforming XML. I can agreewith you there.

I also think this doesn't always happen right now, and that thereforeclients are best advised to handle "Bad XMLNS" in a graceful manner,in particular, not generated a fatal stream-level error.

Furthermore, I note that if clients do this, the requirement toproduce only "Good XMLNS" can be relaxed slightly, since no seriousdamage results. That is, avoid if possible - bad things may result,rather than avoid at all costs - bad things will result. SHOULDinstaed of MUST in RFC 2119 terms.

Finally, I note that the costs can be, in fact, remarkably high for aserver in the case of forwarding stanzas, since in order to merelyforward stanzas, a simple lexing pass is sufficient, whereas to check- in particular - for undeclared prefixes requires a full parse andlookup. These are expensive operations involving allocations, stringcompares, and other primitives that have a detrimental effect onshort-term and long-term server performance.

Which step in my chain of thought here is so offensive that itrequires attack by the HTML bogeyman argument? :-)

 Something which needed to be done to cope with xmlns-unaware
servers. All client developers should roll their own XMLNSprocessing
code? They do, but they shouldn't have had to.

I agree entirely - as I say, nothing in XML-NAMES mandates processorsgenerating fatal errors for the Prefix Declared constraint, andwhilst I can see this is a reasonable thing for handling the case ina static document that the user has control over, it's utterlyunsuitable for most other cases - what is the user supposed to do,after all?

This is the kind of thing that XML processors should offer,signalling that an element (or attribute) is unbound, rather thangenerating a fatal error. It's unfortunate that they do not, and mypersonal hope for the developers of 2015 (which, I hope, will includemyself) is that XML processors themselves will improve, providingwhat developers actually need.

Finally, I should note that XMLNS processing is not really very hard.The code to do so in Gajim consists of around ten lines, from memory- it's slightly more in our server, perhaps as many as 20.


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Re: [Standards] well-formedness

Reply via email to