On Sun, Jul 10, 2011 at 06:26:58PM -0400, Noam Postavsky wrote: > Jon <jon.for...@gmail.com> writes: > > >> In many cases you don't even need that. Write a shell XML file, > >> > >> <!DOCTYPE wrapper SYSTEM "the-dtd-file.dtd" [ > >> <!ELEMENT wrapper the-real-root-element> > >> <!ENTITY the-real-document SYSTEM "bigfile.xml"> > >> ]> > >> <wrapper>&the-real-document;</wrapper> > > > > Will the libxml2 implementation try to bring the entire &the-real-document; > > entity into memory, or will it stream it if I use the SAX2 or Reader API? > > My gut tells me both the dtd and the bigfile.xml will be completely parsed > > into memory. This is fine for the dtd but not for the bigfile.xml. > > A reading of xmlParseReference suggests your gut is wrong. :) > > http://git.gnome.org/browse/libxml2/tree/parser.c#n6823
Yeah I would think that for a extrernal parsed entities we create a new input stream and feed it to the parser, hence progressingly. This may work in constant memory for SAX but unfortunately I'm afraid that for the reader we still build a tree for the entity content (stored in ent->children), so yes we do it progresively, but no unfortunately we accumulate the tree in memory :-\ The real solution would be to allow DTD validation from a preparsed DTD at the xmlreader level directly. For my excuse, validating from a DTD not referenced from the document is not a scenario actually described by XML-1.0, and the way it's implemented will diverge slightly from when you reference with a DOCTYPE. Which is why I think the cleanest is to use a custom I/O which will automatically add the DOCTYPE at the beginning of the document, that's the safest and fastest at this point in my opinion. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml