Re: [xml] redicting parts of trees

cazic Sat, 21 May 2005 06:15:54 -0700

Hi,

> Von: Martijn Faassen <[EMAIL PROTECTED]>
> Datum: Fri, 20 May 2005 23:28:04 +0200
> 
> Kasimier Buchcik wrote:
> > Hi,
> > 
> > On Fri, 2005-05-20 at 16:59 +0200, Martijn Faassen wrote:
> > 
> >>Kasimier Buchcik wrote:
> > 
> > 
> > [...]
> > 
> > 
> >>>Yeah, I read some of the message on your lxml list about your mechanism
> >>>to keep detached nodes alive if they are referenced by multiple wrapper
> >>>proxies. We took a sometimes memory-consuming but simple approach: we
> >>>never free any removed Libxml2 nodes from the document, they are moved
> >>>into an internal list of "garbage" nodes in the document wrapper and
> >>>freed when the document is freed. A "flush" method can be used to
> >>>cleanup such "garbage" nodes, if the user is sure that it's safe.
> >>
> >>Right, since lxml aims at being as "Pythonic" as possible I don't want 
> >>the user to worry about these issues at all. I think I've accomplished 
> >>this fairly well, though I'm still mopping up bugs here and there once 
> >>every while (plus some fundamental stuff I hope to solve for good when 
> >>we have an adoptNode()) and I'm sure some performance issues could be 
> >>improved somewhat still.
> > 
> > OK. We wanted to be it as quick as possible. The "flushGarbage" is
> > normally not called, so only after massive removals of nodes. Do you
> > handle XPath results as well? I.e. is the reference counter (if that's
> > what you do) increased for every node in such a list as well?
> 
> There's in fact no reference counter in the lxml code itself; if a node 
> has a proxy in Python, the same proxy is reused always. The proxy *is* 
> reference counted by Python. :) When the proxy is deallocated, I 
> therefore know nothing points directly at that node. At this point that 
> node is a candidate for removal.


Cool, that's exactly the mechanism I tried 3 years ago! It gave me a lot of
head-ache to implement it, since it was my first contact with XML and
Lixml2. We heavily used the "private" field on Libxml2-nodes at this time,
to hold some wrapper information. But at the end we reverted to the old
behaviour, since we were somehow fightened as we realized that XPath results
needed to be handled as well. So for every XPath-result node, we had to
create wrappers.

> 
> At that point I do not know yet whether there isn't something pointing 
> at a node *below* this node or at a node above this node in the same 
> tree. So, whenever a proxy is deallocated, the following checks are 
> performed:
> 
> * tree walk upwards to see whether the node is still in a document or 
> has nodes with proxies pointing to it below. If so, don't deallocate the 
> tree.
> 
> * if tree is not in doc, tree walk throughout the tree bto see whether 
> any proxy exists that still points to it. If so, don't deallocate.

:-) I know what you've been through. Cool that you succeeded. I hope you
take care of text-nodes as well, which are freed automatically by Libxml2 if
at adjacent position.

> In addition, whole documents are removed if no more proxies exist that 
> refer to this document. This takes advantage of Python refcounting 

The same mechanism as in Delphi.

> again; the proxies have a reference to the document proxy and the 
> underlying document gets deallocated only if no more proxy exists for 
> the whole tree.

Yes.

> The worst case behavior of this algorithm is that a whole tree walk is 
> required. I believe however that this is relatively rare, as most of the 
> time the algorithm can stop when the document node is reached (most 
> nodes after all are in a tree).
> 
> > [...]
> > 
> > 
> >>>Yes, in your case, if single attributes are not expected to be adopted,
> >>>and potentially many auto-created namespace declarations don't bother
> >>>you, the mechanism of xmlReconciliateNs seems best fitting: it just
> >>>re-creates the missing declarations on the adopted element. OK, good to
> >>>know that!
> >>
> >>Yes, indeed. I am a bit concerned the namespace declarations will be 
> >>polluted somewhat when serializing, but I can live with that for now, as
> >>long as the infoset is still okay.
> > 
> > I'll try to store the ns-declarations on the doc, as I have Rob on this
> > side now. So we'll get less redundant ns-declarations.
> 
> That'd be good!

Regards,

Kasimier

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] redicting parts of trees

Reply via email to