In a moment of interantihistimine lucidity tonight, it occurred to me that
XML::Parser normally doesn't go get the DTD that an XML document refers to,
and normally doesn't have to.  But, this train of thought continued,
suppose one had this bit of code in a PXML document:

 <item><c>'dumb' => 1</c></item>
 <p>Enables na&iuml;ve parsing.</p>

To resolve the &iuml;, XML::Parser would have to get and process the PXML
DTD and see it pulling in the W3C's XHTML entity references, which it would
have to go get and process.  This is all fine for my local copy of nsgmls,
as I've set up the local catalog file to redirect queries on the PXML DTD
as well as for the W3C things, to local files.  But I figure that
XML::Parser would have to hit perl.com (or wherever else I keep the DTD)
and W3C.org to get all the files necessary to know that &iuml; is an "ï"
character.  I don't know if XML::Parser uses a CATALOG file, but even if it
did, that's just one more thing people have to bother with.


So I'm thinking of bypassing this problem by banishing from the DTD /all/
those definitions of character entities.  (Altho this still leaves &amp;,
&lt;, &gt;, &apos;, and &quot;, which are predefined.)

This would mean that if you wanted a "ï" character in PXML, you'd have
three alternatives:

1) just use a "ï".  Just make sure that the XML document's declared
encoding agrees with the one your editor's using, and make a point of not
putting your POD thru 8-bit-impure lines.  As we're not living in 1985, the
latter requirement is presumably not problematic.  (Anyone planning to
transmit their POD to 1985 and back might consider UTF-7, God help us all.)

2) use a numeric character reference, i.e., &#xEF; or &#239;

3) brave souls just define the entity for themselves: <!ENTITY iuml "&#239;">

If anyone would be rather put out by these, or has some other suggestion,
then SPEAK NOW, or forever hold your Reese's Pieces!


Anyhoo, I figure that if I take all the character entity declarations out
of the PXML DTD, there'll be no need for XML parsers to have to go snare
the DTD, so that one could even sensibly label PXML documents as standalones!


(BTW, did I mention that Pod::PXML's xml2pod is a validating parser?  The
DTD is hardwired in -- well, the element content models, at least.  I
suppose it'd be trivial to add attribute validity checking.)



(BTW, Reese's Pieces is a trademark of Hershey Foods Corporation.  But I
prefer Droste dark chocolate anyway.)


--
Sean M. Burke  [EMAIL PROTECTED]  http://www.spinn.net/~sburke/

Reply via email to