PHPDoc uses the same setup of DocBook suggested by the DocBook manual [1], namely, split up sections of the DocBook into fragments, and then mush them back in using external entities. These external entities are defined within the internal subset of a document, namely,
<!DOCTYPE book [ <!ENTITY subset SYSTEM "subset.xml"> ]> PHPDoc takes this system up the wazoo; there are entities for literally everything, and we only use XIncludes for very specific post-processing that cannot be done with entities. The practical consequence is that virtually *none* of the xml files found in phpdoc/ are well-formed. The mere use of &reftitle.description; invalidates the document, since /that/ particular document doesn't define this entity (it doesn't have a DTD at all, because parsed entities choke on them [2]). A further consequence is that tool-based validators, even when given the Docbook DTD, are not able to perform validation. I came across this when I booted up an XML file in Komodo Edit 4, and the very first entity was red-underlined. Tools like xmllint can't be applied to single files; the entire build-system must be invoked to test a single file change. I think PHPDoc can do better. THE SOLUTION ------------ The internal subset we use to define our entities is merely an extension of the DTD. Thus, they can be *moved* to an external DTD, something that looks like: <!-- We use xhtml entities all over the place --> <!ENTITY % xhtml-lat1 SYSTEM "@srcdir@/entities/ISO/xhtml1-lat1.ent"> [snip all the other entity inclusions] <!ENTITY % docbook-dtd PUBLIC "-//OASIS//DTD DocBook XML V5.0//EN" "@srcdir@/docbook/docbook-xml/docbook.dtd"> %docbook-dtd; The Doctype in manual.xml(.in) becomes the short and sweet: <!DOCTYPE set SYSTEM "@srcdir@/phpdoc.dtd"> [3] And, although we can't put a DOCTYPE in every XML document in phpdoc/, we can specify it directly with xmllint --dtdvalid, and presto, instant validation. There's nothing earth-shattering about this change; we've simply factored out the necessary DTD definitions so that they can be tacked on to an arbitrary file. [4] Our documents still aren't well-formed, but with a little coaxing they can be made to be so. If we wanted to get fancy, it would be trivial to create a "wrapper" script that takes an XML source file, inserts the proper doctype, and outputs those contents for validation (remember: <!DOCTYPE $element defines the root-level element, so we can use anything we want and the DTD will still validate it fine). Oh, and there's one minor implementation detail: we need LIBXML_DTDLOAD when we load the document (it gets loaded anyway when we call validate(), so there shouldn't be any harm to performance). Comments? ENDNOTES -------- [1] http://www.docbook.org/tdg5/en/html/ch02.html [2] This limitation could be worked around using XInclude, but as discussed in phd/RFC/Buildsystem-proposal.rtf, it's too large and unwieldy to be useful [3] Public identifier pending; also, the path could point anywhere, probably in docbook [4] In theory, it should be possible to make a catalog that maps to our new DTD (although, in such a case, it would be a good idea to redefine xmlns to something else). I haven't tested this for xmllint, but it turns out that a limitation to Komodo's XML syntax checking doesn't use catalog files, so, without setting a DOCTYPE, Komodo refuses to recognize entities and syntax checking remains out of my reach. I have filed a bug/feature request accordingly: http://bugs.activestate.com/show_bug.cgi?id=75287 -- Edward Z. Yang GnuPG: 0x869C48DA HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter [[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]]