Hi, On Thu, 2005-05-19 at 17:16 +0200, Martijn Faassen wrote: > Kasimier Buchcik wrote: > [snip stuff that goes over his head without a lot of further study] > > This is just a cheerleading note; I'm really glad you guys are taking > this up, as I can already see there are many subtle issues involved I > would not have understood without significant study. Thanks! > > Anyway, anything I can do now to help? I will of course be testing this > facility at some stage within lxml, and give feedback then if necessary.
You could describe how you intend to manage namespaces in your wrapper. Will you try to go W3C way or Libxml2 namespace way? Both have pros and cons. The relevant drawback in Libxml2 way is that it's hard, if even not possible, to implement a DOM wrapper which uses a programming language, where the time of destruction of an object lies not within the control of the programmer. Let me try to give some background information - possibly too detailed. I hope to be corrected if something's wrong: Libxml2 handles the corresponding DOM Node methods namespaceURI() and prefix() in the following way: node->ns->prefix == result of node.prefix() node->ns->href == result of node.namespaceURI() The node->ns field is a pointer to an xmlNs struct, which is held in the elem->nsDef field of element-nodes. Such node->nsDef entries correspond to namespace declaration attributes in DOM (e.g. xmlns:foo="urn:test:foo). Libxml2's way demands a node->nsDef entry, thus a namespace declaration attribute, on the node itself or on an ancestor node to be present; which totally reflects the serialized (written as XML file) form. This circumstance creates the following problem: If your remove a attribute-node, which is bound to a namespace, from it's parent, the attr->ns field still points to an elem->nsDef entry. This is OK, as long as this element-node is not itself freed - which would free the elem->nsDef entries as well. The destruction of this element would lead to attr->ns pointing to freed memory. There's no automatic mechanism to avoid this, since there is no reference counting involved. In C this should be user controllable: you just have to know what and when you are freeing something. Not in other programming languages like Python, Delphi, Java, etc. where the destruction time on objects is not always - if ever - predictable. Safe removal of nodes: So we obviously need a mechanism to let point the node->ns reference to an xmlNs entry which is not in danger of being freed unpredictably. A possible location would be an list of xmlNs entries, internally managed by the DOM document wrapper. Another would be to use the "oldNs" field of xmlDoc, or even add a new field to xmlDoc for such porposes. In Libxml2 this can be currently workarounded by reconciliating the node, this re-creates such "stale" declarations on the node. This is quite unpractical since it could end up in creating a vast amount of redundant ns-declarations. Additionally it does not work if an attribute is removed. Namespace reconciliation: When working with Libxml2 in C, adding, cut & pasting nodes, one could end up with a tree, where some of the node->ns entries point to node->nsDef entries located in the wrong position (think of shadowing a namespace prefix). Serializing such a document would end up in a not namespace wellformed XML document. For this reason there's a namespace reconciliation function in Libxml2; it adds namespace declaration attributes where needed, so that the document will be ns-wellformed again. This partly corresponds to the W3C namespace normalization method. The function we try to create here should support both: a way to safely move nodes from the tree and reconciliate namespaces. The way of our wrapper implementation: We use the node->nsDef entries only for serialization purposes. So if working with DOM, node-ns does reference internally stored entries, not node->nsDef entries. Thus we seperate namespace declaration attributes from Node.namespaceURI() and Node.prefix values; which is the W3C way. With DOM you can remove or add an ns-declaration attribute wherever you want, it does not change any node's namespace. The ns-declaration attributes are only there to give the user the ability to explicitely define locations where the XML processor should create ns-declarations when serializing. This is important for QNames, which can be in text content of a node. Example: XML Schema's <xs:element ref="foo:someName"/> here "foo:someName" is a QName which needs a namespace to be declared beforehand, with the _same_ prefix. Like with Libxml2's way we need to normalize the namespaces before serializing. Which could be optimized to only normalized branches where changes through API has been done. Now a part that may be surprising: _Neither_ Libxml2's reconciliation function, _nor_ W3C's namespace normalization can avoid breaking a QName in some special cases through changing of the ns-prefix. Libxml2's current behaviour being more lax here, since it might break QNames in element content and in attribute values, while W3C might break them in attribute values only. It do not encourage people to use our companies way of handling namespaces, since it might make problems in the future: it's not Libxml2's way and thus not handled, thus maybe 100% against some internal expectation in the future. I hope to die before this time comes ;-) Greetings, Kasimier _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
