This one must have come up several times before, but neither Google nor 
the Cookbook have given me an answer.  I'm doing this:

data = sys.stdin.read()
doc = xml.dom.minidom.parseString(data)
root = doc.documentElement
...add and modify some nodes...
sys.stdout.write(root.toxml('utf-8'))

A typical input looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE lec SYSTEM "swc.dtd">
<lec title="Introduction">
   <topic title="Motivation" summary="motivation for course">
     <slide>
       <b1>blah
         <b2>blah &amp; blah</b2>
         <b2>blah&emdash;blah</b2>
       </b1>
     </slide>
   </topic>
</lec>

and my DTD, in its entirety, is:

<!ENTITY emdash "&#x8212;">     <!-- em dash -->
<!ENTITY lceil  "&#x2308;">     <!-- left ceiling -->
<!ENTITY ldots  "&#x2026;">     <!-- horizontal ellipsis -->
<!ENTITY lfloor "&#x230A;">     <!-- left floor -->
<!ENTITY lquot  "&#x201C;">     <!-- left double quotes -->
<!ENTITY plusmn "&#x0177;">     <!-- plus or minus -->
<!ENTITY nbsp   "&#x00A0;">     <!-- non-breaking space -->
<!ENTITY rceil  "&#x2309;">     <!-- right ceiling -->
<!ENTITY rfloor "&#x230B;">     <!-- right floor -->
<!ENTITY rquot  "&#x201D;">     <!-- right double quotes -->
<!ENTITY space  "&#x0020;">     <!-- normal space -->
<!ENTITY squot  "&#x0022;">     <!-- straight double quotes -->
<!ENTITY times  "&#x00D7;">     <!-- multiplication sign -->
<!ENTITY vdots  "&#x22EE;">     <!-- vertical ellipsis -->

Problem is, all of the character entities are missing from my output: 
&amp; and &emdash; disappear.  Hunting around the web, it appears that 
I'm supposed to mess with ExternalEntityRefHandler, but I can't find any 
examples of how the pieces fit together.  If anyone has one, I'd be 
grateful for a pointer...

Thanks,
Greg (gvwilson _a_t_ cs _dot_ utoronto _dot_ ca)

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to