Re: [Standards] well-formedness

Waqas Mon, 20 Oct 2008 16:53:44 -0700

On Mon, Oct 20, 2008 at 9:01 PM, Dave Cridland <[EMAIL PROTECTED]> wrote:
> On Mon Oct 20 04:40:56 2008, Waqas wrote:
>>
>> On Tue, Oct 14, 2008 at 3:28 AM, Peter Saint-Andre <[EMAIL PROTECTED]>
>> wrote:
>> > "an entity SHOULD be liberal in accepting such data."
>>
>> This translates to:
>>
>>  "an entity SHOULD NOT use a namespace-validating parser (as defined
>> in [XML-NAMES])"
>>
>>
> No, I disagree, it translates as "an entity SHOULD NOT use a parser that
> produces an unrecoverable fatal error on an undeclared namespace prefix". I
> disagree that's the same thing at all.
>


The expat parser (as an example) in namespace-aware mode reports a
fatal error on undeclared prefixes. This was added in response to this
bug report: 
http://sourceforge.net/tracker/index.php?func=detail&aid=695401&group_id=10127&atid=110127
which references this section of XML Names:
http://www.w3.org/TR/REC-xml-names/#ns-qualnames

>> This is indeed the case. Entities in the XMPP world tend not to use
>> namespace aware parsers.
>
> Um, wait...
>
>>  In
>> fact most do not care about namespaces at all (aside from a few
>> specific cases where the XEPs use
>> a namespace prefix in the examples, the implementations are often
>> coded to look for that prefix).
>>
>>
> ... because...
>
>
>> Testing with ejabberd and gajim (quite a popular combination), it was
>> quickly clear that both did
>> not deal with valid <message>s where a prefix was used, and both did
>> deal with <message>s with a
>> namespace other than jabber:client.
>>
>>
> ... Gajim was using, and still does use, a namespace aware parser.
>
>

Ah yes, a namespace aware parser (expat) is indeed being used with
namespace awareness disabled...
I looked at the Gajim sources, and using
'http://www.gajim.org/xmlns/undeclared-root' as the namespace of all
undeclared prefixes clearly does not conform with [XML-NAMES].
See: http://www.w3.org/TR/REC-xml-names/#ProcessorConformance

>> All implementations must be namespace non-aware if they don't wish to
>> have the disconnection bug
>> that gajim had. I would like to argue that it was not a bug at all.
>>
>>
> And Gajim is most certainly namespace aware. Please, review the code and
> tell me where it doesn't conform to XML-NAMES.
>

Gajim does not conform to XML-NAMES. I reviewed the code, and it
appears to act correctly for most XML. But it does not act correctly
for prefixes on attributes. And it does not have a single one of all
those required checks for non-conforming XML (except the undeclared
prefix check on tag names). XML-NAMES requires a number of checks for
conformance, some of which are in
http://www.w3.org/TR/REC-xml-names/#Conformance while others are
sprinkled throughout the spec.

Dave, I don't think you want to conform to XML-NAMES. I think you'd
prefer to sanitize the XML instead to make it conform to XML-NAMES.
One step closer to HTML ;)
You might wish to view the expat sources. Some of what it does is here:
http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmlparse.c?revision=1.162&view=markup#l_2942

> There are a few cases in it's higher layers where it ignores the namespace,
> and switches only based on the local-name of the element - this is, indeed,
> an error, but it's one that has nothing at all to do with this. The fact
> that these existed when the namespace handling in Expat was used somewhat
> defeats your argument.
>
>> The behavior when a server receives a badly-namespaced stanza needs to
>> be clarified. I have
>> been working with Matthew Wild on a not-yet-released server. We are
>> wondering whether we should
>> discard the stanza, the element, or raise a stream error. After all,
>> there really is no reason
>> that any (non-malicious) entity should be sending invalid namespaces.
>> If they do then it is a bug,
>> just the same as if they sent invalid XML.
>>
>>
> Wrong. It's impossible, in the current infrastructure, to receive invalid
> XML except via a error in the entity you're actually connected to.
>
> If you receive invalid XML, there is no way to handle it in any useful
> manner.
>
> If you receive invalid XMLNS, however, it might have come from anywhere, and
> merely been forwarded on to you. ANd there's at least three ways of handling
> it.
>
> a) Assume that undeclared prefixes are bound to an arbitrary "unknown"
> namespace. (This is what Gajim does, now). From there, process the stanza as
> much as is possible, which might include doing nothing at all, or rejecting
> it with <service-unavailable/>, just as with any other unrecognized
> namespace.
>
> b) Detect unbound namespaces as a special case, and bounce the stanza.
>
> c) Emit a stream error. This is what Gajim did previously, and what you're
> recommending everyone does.
>

I am NOT suggesting everyone does that, or should do that. I'm saying
everyone tends to do that because server implementations (e.g.,
ejabberd) do not conform to [XML-NAMES].

[XML-NAMES] does say "To conform to this specification, a processor
MUST report violations of namespace well-formedness", and parsers I
have tested with all seem to interpret that as a required fatal error.

>> Just discarding it has a problem. Someone could send a message with
>> invalid namespaces to a
>> conference.jabber.org room. Everyone (human) would see that, except
>> entities which care about
>> namespaces. From the protocol's perspective this would be "correct",
>> but not from a normal user's
>> perspective.
>
> And this will have exactly the same effect with any of the above solutions,
> unless one is mandated. And the interesting thing is if a server passes
> through - as existing servers do, essentially doing (a) - and the clients
> all do (c).
>
> Because then, sending invalid XMLNS via a chat room kicks out all the users,
> and this doesn't seem to be appreciated.
>

Yep, and while servers should indeed support this until the majority
of servers are namespace aware, this should be considered a bug, and
not legitimized by the spec.

>> Sorry if this sounds like a rant. I just don't like where we are headed.
>
> I don't like where we are. I don't like where some people want us to go,
> because they seem to want to send us off into a fantasy land, where servers
> are redeployed in seconds.
>

I understand this. But I do think that we should be stricter in the
long term. What you are suggesting (and did in Gajim) is basically a
hack. Something which needed to be done to cope with xmlns-unaware
servers. All client developers should roll their own XMLNS processing
code? They do, but they shouldn't have had to. I just don't think we
should legitimize a hack.

What I'm saying is that yes, we do need to support the existing
deployments, but their (IMHO) incorrect behavior should be declared as
non-conforming.

> Dave.
> --
> Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
>  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
>  - http://dave.cridland.net/
> Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
>

I understand developers today simply have to accept this non-XML-NAMES
conforming XML. But let's not force the developers writing clients in
2015 to face the same problems. I think most would agree that that
would be pretty sad.

Please understand that even if we use MUST instead of SHOULD with
respect to namespace-awareness, the existing servers are not going to
be left behind. Newer servers and server versions are still going to
continue to support their legacy counterparts. The benefit of course
would be that eventually we will have a sterilized network, where
clients wouldn't need to worry about rolling out their own
(non-conforming) namespace handling. In my opinion this is a better
long term direction.

Waqas.

Re: [Standards] well-formedness

Reply via email to