On Tue, 2005-12-13 at 02:17 +0100, Paul Boddie wrote: > Hello! > > > Don't worry, I don't think this is trivia. I'm happy that this issue > > reached the surface once again, since it seems to me still > > underestimated by DOM users. I guess the most people using DOM think > > that there's no way of how the serialized representation of a DOM tree > > might break. > > It shocks me to see how complicated the standards make this, yet I feel > somewhat embarrassed that I didn't know that createElementNS didn't guarantee > the presence of namespace declarations in a serialised document. Seeing the > thread on comp.lang.python, I suppose I'm not alone in that respect, however.
Warning: the following is wrong; an LSSerializer (DOM Level 3 Load and Save module) will normalize namespaces by _default_. > > A plain DOM serializer will just close it's eyes and won't try to > > change anything what's in the DOM tree. That's fine and wanted > > for e.g. editing applications. > > > > If one wants a samantic-safe serialization, then one needs a > > namespace-normalization mechanism; although you risk breaking > > QNames in element/attribute content on the other hand. > > > > The options here would then be: > > 1) Close your eyes and serialize the tree > > a) if you know exactly that you didn't create mess in the tree then > > this is OK > > b) be aware that your serialized tree might be broken > > 2) Normalize namespaces and then serialize > > a) the normalization will try to change prefixes, > > remove/add ns-declarations, in a way that a serialization is > > possible without altering the semantics of the DOM tree > > b) if the DOM is not serializable then the normalization should raise > > an error > > c) be aware that the normalization might break your QNames > > > > If we apply namespace-normalization to your example, then the outcome > > would look like: > > <href xmlns:ns1='DAV:'/> > > i.e. the namespace declaration of 'DAV:' would get a different prefix, > > in order to not interfere with the <href> element in no namespace. > > But the href element was created with a namespace specified, but with no > prefix in its qualified name. A subsequent discussion touched upon default > namespace pollution where href is created as follows... > > href = document.createElementNS("DAV:", "href") > > ...and where a child of href is created as follows... > > no_ns = document.createElementNS(None, "no_ns") > > ...where None is Python's equivalent of JavaScript's null. For this I > proposed > the following serialisation: > > <?xml version="1.0"?> > <href xmlns="DAV:"> > <no_ns xmlns=""/> > </href> Looks OK; I would expect the same result. Just to be sure we talk about the same: your first example didn't put the <href> element in the "DAV:" namespace. So you provided here a different scenario, right? I.e. ns = libxml2mod.xmlNewNs(element, "DAV:", None) does only create a ns-declaration attribute on the element, but does not assign any namespace to the element. > > On the one hand I use namespace-normalization for small DOM trees, > > where the overhead of a normalization doesn't matter; on the other hand, > > I just try to be careful and keep the serialized form in the back of my > > head when working on a huge DOM tree, where I want to avoid > > ns-normalization. > > My objectives include using libxml2's serialisation wherever possible - > traversing the tree in Python is typically a very slow operation, and having > to fix up the tree is also likely to incur substantial performance costs. Maybe we should implement a namespace-normalization function in Libxml2. Have a look at xmlDOMWrapReconcileNamespaces (in tree.c); it does something similar, but not exactly since namespaces are handled differently in Libxml2 than in DOM; i.e. we cannot simply remove a ns-declaration, since it could be referenced by node->ns fields. I don't know anymore if it does the xmlns="" thingy, so you might want to test this. > Since you're not the first person to suggest namespace normalisation (and the > related DOM standards), I had a look at the pxdom module for Python which is > much more standards-compliant than virtually any other Python DOM > implementation, and it would appear that pxdom does "automagically" (as > someone said) emit xmlns declarations at least in its default configuration, > which I would assume has something to do with the normalisation process or > some related aspect of DOM Level 3. I now even see that I was wrong telling you that a plain DOM serializer won't try to normalize namespaces. It will normalize by default according to: http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-namespaces "namespaces" true [required] (default) Perform the namespace processing as defined in Namespace Normalization. false [optional] Do not perform the namespace processing. So that's the reason why pxdom does it automagically. > Anyway, I'd like to thank you for the kind words and helpful advice. It's a > longer journey of enlightenment than I thought. ;-) Yeah, obviously the same for me. Regards, Kasimier _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml