On Tue Oct 21 00:53:35 2008, Waqas wrote:
The expat parser (as an example) in namespace-aware mode reports a
fatal error on undeclared prefixes. This was added in response to
this
bug report:
http://sourceforge.net/tracker/index.php?func=detail&aid=695401&group_id=10127&atid=110127
which references this section of XML Names:
http://www.w3.org/TR/REC-xml-names/#ns-qualnames
Which doesn't say anything about mandatory fatal errors.
If you're parsing a static document, it's quite reasonable to
generate a fatal error, but I don't think that's the right thing at
all with an XML stream.
Ah yes, a namespace aware parser (expat) is indeed being used with
namespace awareness disabled...
Right - and then namespaces are handled, so the overall result is
that a namespace aware parser is used. If you're mandating that all
XMPP implementations MUST use somebody else's parser, then I don't
know quite what to say.
I looked at the Gajim sources, and using
'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
undeclared prefixes clearly does not conform with [XML-NAMES].
See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance
Nonsense.
"A processor MUST report violations of namespace well-formedness" -
Gajim is doing so, signalling this condition using a specific
namespace URI, so it clearly *does* conform. You may argue that I
should have used some special non-string object instead, if you like,
and that how Gajim handles this signal - by treating it as the
unknown namespace it (kind of) is - is sufficiently simple and neat
as to warrant being maligned as a hack, but it's a damn sight better
than terminating the connection.
Gajim does not conform to XML-NAMES. I reviewed the code, and it
appears to act correctly for most XML. But it does not act correctly
for prefixes on attributes
Not that it did when expat was used to handle the namespaces, either.
Making it handle these properly would involve quite a bit more
rewriting. (Possible and desirable rewriting, to be sure, but nothing
to do with the issue at hand, sorry).
. And it does not have a single one of all
those required checks for non-conforming XML (except the undeclared
prefix check on tag names). XML-NAMES requires a number of checks
for
conformance, some of which are in
http://www.w3.org/TR/REC-xml-names/#Conformance while others are
sprinkled throughout the spec.
I'll accept that - I didn't make it check for multiple colons, etc,
and I might well allow a redefinition of xml: and xmlns:, which'd be
confusing. I ought to fix these at some point.
Incidentally, by stating "except the undeclared prefix check", aren't
you arguing that the code *is* following XML-NAMES in this regard?
Dave, I don't think you want to conform to XML-NAMES. I think you'd
prefer to sanitize the XML instead to make it conform to XML-NAMES.
One step closer to HTML ;)
The mechanism by which I happen to have chosen to report undeclared
namespaces is merely a convenient mechanism which happens to have
result I desired with minimal programming. I happen to think the code
is less hacky than Expat's rather bizarre API, which has namespace
handling hacked on via character delimiters, especially given how
Gajim then used this API. (Either Expat looks up namespaces and then
leaves you a non-standard notation to parse, or else you parse the
standard notation and lookup namespaces yourself, in a more resilient
manner - not a hard choice, really).
What I'm trying to do is look at where we are now, and describe the
best option for developers wishing to deploy now, especially bearing
in mind we need to obtain the best result, where "best" is in terms
of interoperability and potential efficiency. If you disagree with
those goals, please say so - I don't think your goals are all that
different.
You appear to be arguing that the best interoperability (presumably)
is achieved by producing only XML-NAMES conforming XML. I can agree
with you there.
I also think this doesn't always happen right now, and that therefore
clients are best advised to handle "Bad XMLNS" in a graceful manner,
in particular, not generated a fatal stream-level error.
Furthermore, I note that if clients do this, the requirement to
produce only "Good XMLNS" can be relaxed slightly, since no serious
damage results. That is, avoid if possible - bad things may result,
rather than avoid at all costs - bad things will result. SHOULD
instaed of MUST in RFC 2119 terms.
Finally, I note that the costs can be, in fact, remarkably high for a
server in the case of forwarding stanzas, since in order to merely
forward stanzas, a simple lexing pass is sufficient, whereas to check
- in particular - for undeclared prefixes requires a full parse and
lookup. These are expensive operations involving allocations, string
compares, and other primitives that have a detrimental effect on
short-term and long-term server performance.
Which step in my chain of thought here is so offensive that it
requires attack by the HTML bogeyman argument? :-)
Something which needed to be done to cope with xmlns-unaware
servers. All client developers should roll their own XMLNS
processing
code? They do, but they shouldn't have had to.
I agree entirely - as I say, nothing in XML-NAMES mandates processors
generating fatal errors for the Prefix Declared constraint, and
whilst I can see this is a reasonable thing for handling the case in
a static document that the user has control over, it's utterly
unsuitable for most other cases - what is the user supposed to do,
after all?
This is the kind of thing that XML processors should offer,
signalling that an element (or attribute) is unbound, rather than
generating a fatal error. It's unfortunate that they do not, and my
personal hope for the developers of 2015 (which, I hope, will include
myself) is that XML processors themselves will improve, providing
what developers actually need.
Finally, I should note that XMLNS processing is not really very hard.
The code to do so in Gajim consists of around ten lines, from memory
- it's slightly more in our server, perhaps as many as 20.
Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade