On Thu, 9 Dec 2004 20:47:41 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > if you add this to the inner loop, > > print titleNode.childNodes > print titleNode.firstChild.wholeText > > you get this output (under 2.3.3): > > [<DOM Text node "\n">, <DOM CDATASection node "Plone: rem...">] >
Thanks Frederik > > http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470 > > this bug report complains that the DOM represents the CDATA section as > four text nodes, which is also perfectly valid (see Martin's explanation). > code > that depends on being able to identify a CDATA section in the source file is > broken; character data, character references, entities, and CDATA section > should all be treated as text. that makes sense > btw, here's the corresponding ElementTree version: > > from elementtree import ElementTree > > tree = ElementTree.parse("foo.xml") > > for node in tree.findall(".//blog"): > print node.get("id") > for content_node in node.findall("text"): > print content_node.findtext("blogtitle") > > or, shorter: > > for node in tree.findall(".//blog"): > print node.get("id") > print node.findtext("text/blogtitle") > wow, that looks like a more concise way to do it - thanks i'll take a look at that. FWIW I had some sucess using Sax2 last night:- import sys from xml.dom.ext.reader import Sax2 # create Reader object reader = Sax2.Reader() # parse the document dom1 = reader.fromStream('200406archive010.xml') for node in dom1.getElementsByTagName("blog"): id = node.getAttribute("id") print int(id) for contentNode in node.getElementsByTagName("text"): for titleNode in contentNode.getElementsByTagName("blogtitle"): print titleNode.firstChild.data for titleNode in contentNode.getElementsByTagName("blogbody"): print titleNode.firstChild.data -- Rick Hurst http://hypothecate.co.uk _______________________________________________ XML-SIG maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/xml-sig