I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question.
I've been using the lxml variant of the ElementTree API, which I understand works in much the same way (with some significant additions). In particular, it shares the use of a .tail attribute. I ran headlong into this aspect of the API while doing some DOM manipulations, and it's got me pretty confused. Example: >>> from lxml import etree as ET >>> frag = ET.XML('<a>head<b>inside</b>tail</a>') >>> b = frag.xpath('//b')[0] >>> b <Element b at 71cbe8> >>> b.text 'inside' >>> b.tail 'tail' >>> frag.remove(b) >>> ET.tostring(frag) '<a>head</a>' As you can see, the .tail text is removed as part of the <b> element -- but it IS NOT part of the <b> element. I understand the use of the .tail attribute given the desire to simplify the API by avoiding pure text nodes, but it seems entirely inappropriate for the tail text to disappear into the ether when what is technically a sibling node is removed. Performing the same operations with the Java DOM api (crimson, in this case it turns out) yields what I would expect (here I'm using JPype to access a v1.4.2 JVM through python -- which makes things somewhat less painful): >>> from jpype import * >>> startJVM(getDefaultJVMPath()) >>> builder = javax.xml.parsers.DocumentBuilderFactory.newInstance ().newDocumentBuilder() >>> xml = java.io.ByteArrayInputStream(java.lang.String ('<a>head<b>inside</b>tail</a>').getBytes()) >>> doc = builder.parse(xml) >>> a = doc.documentElement >>> a.toString() u'<a>head<b>inside</b>tail</a>' >>> b = a.getElementsByTagName('b').item(0) >>> a.removeChild(b) >>> a.toString() u'<a>headtail</a>' (Sorry for the Java comparison, but that's where I first cut my teeth on XML, and that's where my expectations were formed.) That's a pretty significant mismatch in functionality. I certainly understand the motivations of Mr. Lundh to make the ET API as pythonic as possible, but ET's behaviour in this specific context is flatly wrong as far as I can see. I would have expected that a removal operation would have appended <b>'s tail text to the text of <a> (or perhaps to the tail text of <b>'s closest preceding sibling) -- something that I think I'm going to have to do in order to continue using lxml / ElementTree. I ran this issue past a few people I know who've worked with and written about ElementTree, and their response to this apparent divergence between the ET DOM API and "standard" DOM APIs was roughly: "that's just the way it is". Comments, thoughts? Chas Emerick Founder, Snowtide Informatics Systems Enterprise-class PDF content extraction [EMAIL PROTECTED] http://snowtide.com | +1 413.519.6365 -- http://mail.python.org/mailman/listinfo/python-list