On Tue, 2005-12-13 at 02:17 +0100, Paul Boddie wrote:
> Hello!
> 
> > Don't worry, I don't think this is trivia. I'm happy that this issue
> > reached the surface once again, since it seems to me still
> > underestimated by DOM users. I guess the most people using DOM think
> > that there's no way of how the serialized representation of a DOM tree
> > might break.
> 
> It shocks me to see how complicated the standards make this, yet I feel 
> somewhat embarrassed that I didn't know that createElementNS didn't guarantee 
> the presence of namespace declarations in a serialised document. Seeing the 
> thread on comp.lang.python, I suppose I'm not alone in that respect, however.

Warning: the following is wrong; an LSSerializer (DOM Level 3 Load and
Save module) will normalize namespaces by _default_.

> > A plain DOM serializer will just close it's eyes and won't try to
> > change anything what's in the DOM tree. That's fine and wanted
> > for e.g. editing applications.
> >
> > If one wants a samantic-safe serialization, then one needs a
> > namespace-normalization mechanism; although you risk breaking
> > QNames in element/attribute content on the other hand.
> >
> > The options here would then be:
> > 1) Close your eyes and serialize the tree
> >   a) if you know exactly that you didn't create mess in the tree then
> >      this is OK
> >   b) be aware that your serialized tree might be broken
> > 2) Normalize namespaces and then serialize
> >   a) the normalization will try to change prefixes,
> >      remove/add ns-declarations, in a way that a serialization is
> >      possible without altering the semantics of the DOM tree
> >   b) if the DOM is not serializable then the normalization should raise
> >      an error
> >   c) be aware that the normalization might break your QNames
> >
> > If we apply namespace-normalization to your example, then the outcome
> > would look like:
> > <href xmlns:ns1='DAV:'/>
> > i.e. the namespace declaration of 'DAV:' would get a different prefix,
> > in order to not interfere with the <href> element in no namespace.
> 
> But the href element was created with a namespace specified, but with no 
> prefix in its qualified name. A subsequent discussion touched upon default 
> namespace pollution where href is created as follows...
> 
> href = document.createElementNS("DAV:", "href")
> 
> ...and where a child of href is created as follows...
> 
> no_ns = document.createElementNS(None, "no_ns")
> 
> ...where None is Python's equivalent of JavaScript's null. For this I 
> proposed 
> the following serialisation:
> 
> <?xml version="1.0"?>
> <href xmlns="DAV:">
>   <no_ns xmlns=""/>
> </href>

Looks OK; I would expect the same result.

Just to be sure we talk about the same: your first example didn't put
the <href> element in the "DAV:" namespace. So you provided here a
different scenario, right?
I.e. ns = libxml2mod.xmlNewNs(element, "DAV:", None) does only
create a ns-declaration attribute on the element, but does not
assign any namespace to the element.

> > On the one hand I use namespace-normalization for small DOM trees,
> > where the overhead of a normalization doesn't matter; on the other hand,
> > I just try to be careful and keep the serialized form in the back of my
> > head when working on a huge DOM tree, where I want to avoid
> > ns-normalization.
> 
> My objectives include using libxml2's serialisation wherever possible - 
> traversing the tree in Python is typically a very slow operation, and having 
> to fix up the tree is also likely to incur substantial performance costs. 

Maybe we should implement a namespace-normalization function in Libxml2.
Have a look at xmlDOMWrapReconcileNamespaces (in tree.c); it does
something similar, but not exactly since namespaces are handled
differently in Libxml2 than in DOM; i.e. we cannot simply remove a
ns-declaration, since it could be referenced by node->ns fields. I don't
know anymore if it does the xmlns="" thingy, so you might want to test
this.

> Since you're not the first person to suggest namespace normalisation (and the 
> related DOM standards), I had a look at the pxdom module for Python which is 
> much more standards-compliant than virtually any other Python DOM 
> implementation, and it would appear that pxdom does "automagically" (as 
> someone said) emit xmlns declarations at least in its default configuration, 
> which I would assume has something to do with the normalisation process or 
> some related aspect of DOM Level 3.

I now even see that I was wrong telling you that a plain DOM serializer
won't try to normalize namespaces. It will normalize by default
according to:
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-namespaces

"namespaces"
true
        [required] (default)
        Perform the namespace processing as defined in Namespace
        Normalization. 
false
        [optional]
        Do not perform the namespace processing.
        
So that's the reason why pxdom does it automagically.

> Anyway, I'd like to thank you for the kind words and helpful advice. It's a 
> longer journey of enlightenment than I thought. ;-)

Yeah, obviously the same for me.

Regards,

Kasimier
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to