> > >> In many cases you don't even need that. Write a shell XML file, > > >> > > >> <!DOCTYPE wrapper SYSTEM "the-dtd-file.dtd" [ > > >> <!ELEMENT wrapper the-real-root-element> > > >> <!ENTITY the-real-document SYSTEM "bigfile.xml"> > > >> ]> > > >> <wrapper>&the-real-document;</wrapper> > > > > > > Will the libxml2 implementation try to bring the entire > > > &the-real-document; entity into memory, or will it stream it if I use the > > > SAX2 or Reader API? My gut tells me both the dtd and the bigfile.xml > > > will be completely parsed into memory. This is fine for the dtd but not > > > for the bigfile.xml. > > > > A reading of xmlParseReference suggests your gut is wrong. :) > > > > http://git.gnome.org/browse/libxml2/tree/parser.c#n6823 > > Yeah I would think that for a extrernal parsed entities we create a > new input stream and feed it to the parser, hence progressingly. > This may work in constant memory for SAX but unfortunately I'm afraid > that for the reader we still build a tree for the entity content > (stored in ent->children), so yes we do it progresively, but no > unfortunately we accumulate the tree in memory :-\
OK, I'll catch up and learn what xmlParseReference is doing. Good to know it's constant memory in SAX and I'll focus my testing of the wrapping idea with SAX. > The real solution would be to allow DTD validation from a preparsed > DTD at the xmlreader level directly. For my excuse, validating from > a DTD not referenced from the document is not a scenario actually > described by XML-1.0, and the way it's implemented will diverge slightly > from when you reference with a DOCTYPE. Which is why I think the > cleanest is to use a custom I/O which will automatically add the DOCTYPE > at the beginning of the document, that's the safest and fastest at this > point in my opinion. That sounds very interesting. If I understand you correctly, you think custom I/O would handle the case in which a DOCTYPE needs to be injected at the beginning of the document as well as the case in which an existing DOCTYPE in the document needs to be replaced by a DOCTYPE like <!DOCTYPE real-root SYSTEM "my_dtd_file.dtd">? What area of the code to I need to start learning in order to understand your custom I/O idea? >From the Reader API perspective, do you think just a single function like /* parse/compile DTD at given at location `uri` */ int xmlTextReaderDtdValidate(xmlTextReaderPtr reader, const char *uri); in combination with behavioral updates to `xmlTextReaderIsValid`, `xmlFreeTextReader`, `xmlCleanupParser`, `xmlTextReaderRead`, and `struct _xmlTextReader` is what's needed? I'm not yet libxml2 literate to make specific suggestions, but I am curious as to the scope of the work you think is needed. FWIW, I dug up this old C#/.NET code in which I'd been experimenting with similar ideas but it's not smart enough to replace an existing DOCTYPE. I think it still works but I'm not sure if any of the APIs it uses have been deprecated. https://gist.github.com/1075878 Jon --- blog: http://jonforums.github.com/ twitter: @jonforums "Anyone who can only think of one way to spell a word obviously lacks imagination." - Mark Twain _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml