Jon <jon.for...@gmail.com> writes:

>> I am new to the libxml2 api and am looking to use it to create a simple tool 
>> that can validate large xml files via external DTDs, and eventually XSDs. 
>> I've successfully built libxml2 on win7 using a mingw toolchain and plan to 
>> build the tool as a statically linked exe for windows.
>> 
>> I've found http://mail.gnome.org/archives/xml/2004-July/msg00055.html and 
>> http://mail.gnome.org/archives/xml/2009-November/msg00039.html and would 
>> appreciate pointers in the right direction, either sections in xmllint.c to 
>> review or ideas on how to use the Reader api to do this.

XMLStarlet does this too, maybe it will be useful for you:
http://xmlstar.git.sourceforge.net/git/gitweb.cgi?p=xmlstar/xmlstar;a=blob;f=src/xml_validate.c;hb=HEAD

>> 
>> I'm more concerned about memory usage and speed and have no preference 
>> between using the SAX2 or Reader apis.
>
>
> After skimming xmllint.c I want to confirm that my understanding of the 
> following is correct.
>
> 1) The only way to use xmllint to validate against an external DTD file is
>
>    xmllint --dtdvalid luddite.dtd file1.xml file2.xml ...
>
> and the following will not work as neither `testSAX()` nor `streamFile()` 
> validate against an external DTD file:
>
>    xmllint --sax --dtdvalid luddite.dtd file1.xml ...
>    xmllint --stream --dtdvalid luddite.dtd file1.xml ...

Yes, as a consequence of 4).

>
> 2) Does the following mean that when using libxml2's SAX functionality a 
> document representation of the entire input XML is created in memory?
>
>    http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1711

No, it depends on the handler in use. The code you reference there is
checking for unexpected creation of DOM tree: unexpected because neither
the emptySAXHandler nor the debugSAXHandler create a DOM tree.

>
> 3) As of v2.7.8 and using the Reader API, there is no way to validate using 
> an external DTD similar to
>
>    http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1881
>    http://git.gnome.org/browse/libxml2/tree/xmllint.c#n1896
>

Yes, see https://bugzilla.gnome.org/show_bug.cgi?id=169375

>
> 4) As of v2.7.8 and using the Reader API, there is no way to a posteriori 
> validate using an external DTD similar the following. A posteriori DTD 
> validation is only available after parsing a full DOM into memory.
>
>    http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2759

Yes, which in addition to the memory usage also has the problem that the
DOM structure uses 2 bytes to hold line numbers, so error messages don't
have the right line number after 65535.

https://bugzilla.gnome.org/show_bug.cgi?id=143739

>
>
> If the above are correct, what do you suggest to people who want to use 
> libxml2 to validate large XMLs with external DTD files?  Re-write the input 
> XML file?

Pretty much yeah. It's not so bad, just a tiny DOCTYPE refering to the DTD.

Noam
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to