Rick Hurst wrote: > i'm trying the following (i'm a python newbie BTW):- > > from xml.dom.minidom import parse, parseString > dom1 = parse('foo.xml') > > for node in dom1.getElementsByTagName("blog"): > id = node.getAttribute("id") > print id > for contentNode in node.getElementsByTagName("text"): > for titleNode in node.getElementsByTagName("blogtitle"): > print titleNode.nodeName #returns "blogtitle" > print titleNode.nodeType #returns 1 > #print titleNode.data #AttributeError: Element > instance has no attribute 'data' > print titleNode.nodeValue #returns "None" > > is there a way of doing this with minidom or do I need to be using a > different parser? Any advice appreciated!
if you add this to the inner loop, print titleNode.childNodes print titleNode.firstChild.wholeText you get this output (under 2.3.3): [<DOM Text node "\n">, <DOM CDATASection node "Plone: rem...">] Plone: remove member self registration > > http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470 this bug report complains that the DOM represents the CDATA section as four text nodes, which is also perfectly valid (see Martin's explanation). code that depends on being able to identify a CDATA section in the source file is broken; character data, character references, entities, and CDATA section should all be treated as text. btw, here's the corresponding ElementTree version: from elementtree import ElementTree tree = ElementTree.parse("foo.xml") for node in tree.findall(".//blog"): print node.get("id") for content_node in node.findall("text"): print content_node.findtext("blogtitle") or, shorter: for node in tree.findall(".//blog"): print node.get("id") print node.findtext("text/blogtitle") </F> _______________________________________________ XML-SIG maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/xml-sig