Fredrik Lundh wrote: >> Sorry if this should go to a list, I couldn't find one... >> (please send me that way if there is one...) > > python-list/comp.lang.python or xml-sig are good choices.
OK, lets go with xml-sig :) >> I've bumped into an annoying problem, which I actually think is a >> problem with expat: >> >> >>> from xml.parsers import expat >> >>> parser = expat.ParserCreate() >> >>> def handle(data): print repr(data) >> ... >> >>> parser.CharacterDataHandler = handle >> >>> parser.Parse('<xml><node/></xml>',0) >> u'<' >> u'node/' >> u'>' >> 1 >> >> Now, why is expat unquoting those two entities? > > in an XML file, the characters < and & *must* be escaped (either as > entity references or character references) when appearing in normal > text: Yes indeed. > the following entities are predefined: & (&) < (<) > (>) > " (") ' ('). Okay, so in the above, if I really mean <, the xml should be: '<xml>&lt;/&gt;</xml>' Seems a little clunky, but okay... I guess this was causing me problems as I'm working on a bug in Twiddler (http://www.simplistix.co.uk/software/python/twiddler) where quoted html was ending up unquoted after processing: >>> from twiddler import Twiddler >>> t = Twiddler('<span><b></span>') >>> t.render() u'<span><b></span>' Now, I see how you fixed this in ElementTree by re-escaping all the predefined entities (out of interest, why is the funtion called _escape_cdata rather than _escape_data?) but I can't do that because I want uses to be able to insert chunks of html and choose whether or not they are escaped: >>> t = Twiddler('<span id="something"/>') escaping: >>> t['something'].replace('<b>') >>> t.render() u'<span id="something"><b></span>' no escaping: >>> t['something'].replace('<b>',filters=()) >>> t.render() u'<span id="something"><b></span>' I guess in my use of ElementTree, I need to make sure character data is re-escaped at the tree building stage? > other names give an error unless they've been > explicitly defined. So I see: >>> from xml.parsers import expat >>> parser = expat.ParserCreate() >>> parser.Parse('<xml>&foo;</xml>',0) Traceback (most recent call last): File "<stdin>", line 1, in ? xml.parsers.expat.ExpatError: undefined entity: line 1, column 5 But why does calling UseForeignDTD suddenly make everything ok? >>> parser = expat.ParserCreate() >>> parser.UseForeignDTD() >>> parser.Parse('<xml>&foo;</xml>',0) 1 What extra hooks get called as a result of calling UseForeignDTD? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig