Re: [Standards] well-formedness

Peter Saint-Andre Wed, 22 Oct 2008 08:38:25 -0700

Dave, what text would you propose?

As a reminder, the provisional text in version -08 of rfc3920bis is:


***

12.3.  Well-Formedness

   There are two varieties of well-formedness:

   o  "XML-well-formedness" in accordance with the definition of "well-
      formed" in Section 2.1 of [XML].
   o  "Namespace-well-formedness" in accordance with the definition of
      "namespace-well-formed" in Section 7 of [XML-NAMES].

   The following rules apply.

   An XMPP entity MUST NOT generate data that is not XML-well-formed.
   An XMPP entity MUST NOT accept data that is not XML-well-formed;
   instead it MUST return an <xml-not-well-formed/> stream error and
   close the stream over which the data was received.

   An XMPP entity MUST NOT generate data that is not namespace-well-
   formed.  An XMPP server SHOULD NOT route or deliver data that is not
   namespace-well-formed, and SHOULD return a stanza error of <not-
   acceptable/&ggt; or a stream error of <xml-not-well-formed/> in
   response to the receipt of such data.

      Note: Because these restrictions were underspecified in an earlier
      revision of this specification, it is possible that
      implementations based on that revision will send data that does
      not comply with the restrictions; an entity SHOULD be liberal in
      accepting such data.

***

Dave Cridland wrote:
> On Tue Oct 21 00:53:35 2008, Waqas wrote:
>> The expat parser (as an example) in namespace-aware mode reports a
>> fatal error on undeclared prefixes. This was added in response to this
>> bug report:
>> http://sourceforge.net/tracker/index.php?func=detail&aid=695401&group_id=10127&atid=110127
>>
>> which references this section of XML Names:
>> http://www.w3.org/TR/REC-xml-names/#ns-qualnames
>>
>>
> Which doesn't say anything about mandatory fatal errors.
> 
> If you're parsing a static document, it's quite reasonable to generate a
> fatal error, but I don't think that's the right thing at all with an XML
> stream.
> 
>> Ah yes, a namespace aware parser (expat) is indeed being used with
>> namespace awareness disabled...
> 
> Right - and then namespaces are handled, so the overall result is that a
> namespace aware parser is used. If you're mandating that all XMPP
> implementations MUST use somebody else's parser, then I don't know quite
> what to say.
> 
> 
>> I looked at the Gajim sources, and using
>> 'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
>> undeclared prefixes clearly does not conform with [XML-NAMES].
>> See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance
>>
>>
> Nonsense.
> 
> "A processor MUST report violations of namespace well-formedness" -
> Gajim is doing so, signalling this condition using a specific namespace
> URI, so it clearly *does* conform. You may argue that I should have used
> some special non-string object instead, if you like, and that how Gajim
> handles this signal - by treating it as the unknown namespace it (kind
> of) is - is sufficiently simple and neat as to warrant being maligned as
> a hack, but it's a damn sight better than terminating the connection.
> 
>> Gajim does not conform to XML-NAMES. I reviewed the code, and it
>> appears to act correctly for most XML. But it does not act correctly
>> for prefixes on attributes
> 
> Not that it did when expat was used to handle the namespaces, either.
> Making it handle these properly would involve quite a bit more
> rewriting. (Possible and desirable rewriting, to be sure, but nothing to
> do with the issue at hand, sorry).
> 
>> . And it does not have a single one of all
>> those required checks for non-conforming XML (except the undeclared
>> prefix check on tag names). XML-NAMES requires a number of checks for
>> conformance, some of which are in
>> http://www.w3.org/TR/REC-xml-names/#Conformance while others are
>> sprinkled throughout the spec.
>>
>>
> I'll accept that - I didn't make it check for multiple colons, etc, and
> I might well allow a redefinition of xml: and xmlns:, which'd be
> confusing. I ought to fix these at some point.
> 
> Incidentally, by stating "except the undeclared prefix check", aren't
> you arguing that the code *is* following XML-NAMES in this regard?
> 
>> Dave, I don't think you want to conform to XML-NAMES. I think you'd
>> prefer to sanitize the XML instead to make it conform to XML-NAMES.
>> One step closer to HTML ;)
> 
> The mechanism by which I happen to have chosen to report undeclared
> namespaces is merely a convenient mechanism which happens to have result
> I desired with minimal programming. I happen to think the code is less
> hacky than Expat's rather bizarre API, which has namespace handling
> hacked on via character delimiters, especially given how Gajim then used
> this API. (Either Expat looks up namespaces and then leaves you a
> non-standard notation to parse, or else you parse the standard notation
> and lookup namespaces yourself, in a more resilient manner - not a hard
> choice, really).
> 
> What I'm trying to do is look at where we are now, and describe the best
> option for developers wishing to deploy now, especially bearing in mind
> we need to obtain the best result, where "best" is in terms of
> interoperability and potential efficiency. If you disagree with those
> goals, please say so - I don't think your goals are all that different.
> 
> You appear to be arguing that the best interoperability (presumably) is
> achieved by producing only XML-NAMES conforming XML. I can agree with
> you there.
> 
> I also think this doesn't always happen right now, and that therefore
> clients are best advised to handle "Bad XMLNS" in a graceful manner, in
> particular, not generated a fatal stream-level error.
> 
> Furthermore, I note that if clients do this, the requirement to produce
> only "Good XMLNS" can be relaxed slightly, since no serious damage
> results. That is, avoid if possible - bad things may result, rather than
> avoid at all costs - bad things will result. SHOULD instaed of MUST in
> RFC 2119 terms.
> 
> Finally, I note that the costs can be, in fact, remarkably high for a
> server in the case of forwarding stanzas, since in order to merely
> forward stanzas, a simple lexing pass is sufficient, whereas to check -
> in particular - for undeclared prefixes requires a full parse and
> lookup. These are expensive operations involving allocations, string
> compares, and other primitives that have a detrimental effect on
> short-term and long-term server performance.
> 
> Which step in my chain of thought here is so offensive that it requires
> attack by the HTML bogeyman argument? :-)
> 
>>  Something which needed to be done to cope with xmlns-unaware
>> servers. All client developers should roll their own XMLNS processing
>> code? They do, but they shouldn't have had to.
> 
> I agree entirely - as I say, nothing in XML-NAMES mandates processors
> generating fatal errors for the Prefix Declared constraint, and whilst I
> can see this is a reasonable thing for handling the case in a static
> document that the user has control over, it's utterly unsuitable for
> most other cases - what is the user supposed to do, after all?
> 
> This is the kind of thing that XML processors should offer, signalling
> that an element (or attribute) is unbound, rather than generating a
> fatal error. It's unfortunate that they do not, and my personal hope for
> the developers of 2015 (which, I hope, will include myself) is that XML
> processors themselves will improve, providing what developers actually
> need.
> 
> Finally, I should note that XMLNS processing is not really very hard.
> The code to do so in Gajim consists of around ten lines, from memory -
> it's slightly more in our server, perhaps as many as 20.
> 
> Dave.

Re: [Standards] well-formedness

Reply via email to