This one must have come up several times before, but neither Google nor the Cookbook have given me an answer. I'm doing this:
data = sys.stdin.read() doc = xml.dom.minidom.parseString(data) root = doc.documentElement ...add and modify some nodes... sys.stdout.write(root.toxml('utf-8')) A typical input looks like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE lec SYSTEM "swc.dtd"> <lec title="Introduction"> <topic title="Motivation" summary="motivation for course"> <slide> <b1>blah <b2>blah & blah</b2> <b2>blah&emdash;blah</b2> </b1> </slide> </topic> </lec> and my DTD, in its entirety, is: <!ENTITY emdash "舒"> <!-- em dash --> <!ENTITY lceil "⌈"> <!-- left ceiling --> <!ENTITY ldots "…"> <!-- horizontal ellipsis --> <!ENTITY lfloor "⌊"> <!-- left floor --> <!ENTITY lquot "“"> <!-- left double quotes --> <!ENTITY plusmn "ŷ"> <!-- plus or minus --> <!ENTITY nbsp " "> <!-- non-breaking space --> <!ENTITY rceil "⌉"> <!-- right ceiling --> <!ENTITY rfloor "⌋"> <!-- right floor --> <!ENTITY rquot "”"> <!-- right double quotes --> <!ENTITY space " "> <!-- normal space --> <!ENTITY squot """> <!-- straight double quotes --> <!ENTITY times "×"> <!-- multiplication sign --> <!ENTITY vdots "⋮"> <!-- vertical ellipsis --> Problem is, all of the character entities are missing from my output: & and &emdash; disappear. Hunting around the web, it appears that I'm supposed to mess with ExternalEntityRefHandler, but I can't find any examples of how the pieces fit together. If anyone has one, I'd be grateful for a pointer... Thanks, Greg (gvwilson _a_t_ cs _dot_ utoronto _dot_ ca) _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig