Jon <jon.for...@gmail.com> writes: >> I am new to the libxml2 api and am looking to use it to create a simple tool >> that can validate large xml files via external DTDs, and eventually XSDs. >> I've successfully built libxml2 on win7 using a mingw toolchain and plan to >> build the tool as a statically linked exe for windows. >> >> I've found http://mail.gnome.org/archives/xml/2004-July/msg00055.html and >> http://mail.gnome.org/archives/xml/2009-November/msg00039.html and would >> appreciate pointers in the right direction, either sections in xmllint.c to >> review or ideas on how to use the Reader api to do this.
XMLStarlet does this too, maybe it will be useful for you: http://xmlstar.git.sourceforge.net/git/gitweb.cgi?p=xmlstar/xmlstar;a=blob;f=src/xml_validate.c;hb=HEAD >> >> I'm more concerned about memory usage and speed and have no preference >> between using the SAX2 or Reader apis. > > > After skimming xmllint.c I want to confirm that my understanding of the > following is correct. > > 1) The only way to use xmllint to validate against an external DTD file is > > xmllint --dtdvalid luddite.dtd file1.xml file2.xml ... > > and the following will not work as neither `testSAX()` nor `streamFile()` > validate against an external DTD file: > > xmllint --sax --dtdvalid luddite.dtd file1.xml ... > xmllint --stream --dtdvalid luddite.dtd file1.xml ... Yes, as a consequence of 4). > > 2) Does the following mean that when using libxml2's SAX functionality a > document representation of the entire input XML is created in memory? > > http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1711 No, it depends on the handler in use. The code you reference there is checking for unexpected creation of DOM tree: unexpected because neither the emptySAXHandler nor the debugSAXHandler create a DOM tree. > > 3) As of v2.7.8 and using the Reader API, there is no way to validate using > an external DTD similar to > > http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1881 > http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1896 > Yes, see https://bugzilla.gnome.org/show_bug.cgi?id=169375 > > 4) As of v2.7.8 and using the Reader API, there is no way to a posteriori > validate using an external DTD similar the following. A posteriori DTD > validation is only available after parsing a full DOM into memory. > > http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2759 Yes, which in addition to the memory usage also has the problem that the DOM structure uses 2 bytes to hold line numbers, so error messages don't have the right line number after 65535. https://bugzilla.gnome.org/show_bug.cgi?id=143739 > > > If the above are correct, what do you suggest to people who want to use > libxml2 to validate large XMLs with external DTD files? Re-write the input > XML file? Pretty much yeah. It's not so bad, just a tiny DOCTYPE refering to the DTD. Noam _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml