On Fri, 2005-03-11 at 18:16 +0900, Grant Morganryuuguu wrote: > I solved the problem and am responding to myself for the benifit of future > googlers. > The sax parsers my split nodes of type CHARACTERS into multiple nodes so they > have to be joined back together. Since pulldom depends on a sax parser it > also may do this. My method to find and join together the next CHARACTERS > node is below. It assumes that > self.event,self.node = iter.next() > was executed previously. > > def getCharacterNode(self,iter): > while self.event != 'CHARACTERS': > self.event,self.node = iter.next() > chars=[] > chars.append(self.node.nodeValue) > self.event,self.node = iter.next() > while self.event == 'CHARACTERS': > chars.append(self.node.nodeValue) > self.event,self.node = iter.next() > return ''.join(chars)
Or see: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881 and the updated version that is part of Amara: http://www.xml.com/pub/a/2005/01/19/amara.html http://cvs.4suite.org/viewcvs/Amara/lib/saxtools.py?rev=1.9&view=markup (class normalize_text_filter, which you should be able to copy to your code if you don't want to install Amara). -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.html Gems from the Mines: 2002 to 2003 - http://www.xml.com/pub/a/2005/03/02/pyxml.html Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig