> > >> In many cases you don't even need that. Write a shell XML file,
> > >> 
> > >> <!DOCTYPE wrapper SYSTEM "the-dtd-file.dtd" [
> > >>   <!ELEMENT wrapper the-real-root-element>
> > >>   <!ENTITY the-real-document SYSTEM "bigfile.xml">
> > >> ]>
> > >> <wrapper>&the-real-document;</wrapper>
> > >
> > > Will the libxml2 implementation try to bring the entire 
> > > &the-real-document; entity into memory, or will it stream it if I use the 
> > > SAX2 or Reader API?  My gut tells me both the dtd and the bigfile.xml 
> > > will be completely parsed into memory. This is fine for the dtd but not 
> > > for the bigfile.xml.
> > 
> > A reading of xmlParseReference suggests your gut is wrong. :)
> > 
> > http://git.gnome.org/browse/libxml2/tree/parser.c#n6823
> 
>   Yeah I would think that for a extrernal parsed entities we create a
> new input stream and feed it to the parser, hence progressingly.
> This may work in constant memory for SAX but unfortunately I'm afraid
> that for the reader we still build a tree for the entity content
> (stored in ent->children), so yes we do it progresively, but no
> unfortunately we accumulate the tree in memory :-\

OK, I'll catch up and learn what xmlParseReference is doing. Good to know it's 
constant memory in SAX and I'll focus my testing of the wrapping idea with SAX. 


>   The real solution would be to allow DTD validation from a preparsed
> DTD at the xmlreader level directly. For my excuse, validating from
> a DTD not referenced from the document is not a scenario actually
> described by XML-1.0, and the way it's implemented will diverge slightly
> from when you reference with a DOCTYPE. Which is why I think the
> cleanest is to use a custom I/O which will automatically add the DOCTYPE
> at the beginning of the document, that's the safest and fastest at this
> point in my opinion.

That sounds very interesting.

If I understand you correctly, you think custom I/O would handle the case in 
which a DOCTYPE needs to be injected at the beginning of the document as well 
as the case in which an existing DOCTYPE in the document needs to be replaced 
by a DOCTYPE like <!DOCTYPE real-root SYSTEM "my_dtd_file.dtd">?

What area of the code to I need to start learning in order to understand your 
custom I/O idea?

>From the Reader API perspective, do you think just a single function like

  /* parse/compile DTD at given at location `uri` */
  int xmlTextReaderDtdValidate(xmlTextReaderPtr reader, const char *uri);

in combination with behavioral updates to `xmlTextReaderIsValid`, 
`xmlFreeTextReader`, `xmlCleanupParser`, `xmlTextReaderRead`, and `struct 
_xmlTextReader` is what's needed?  I'm not yet libxml2 literate to make 
specific suggestions, but I am curious as to the scope of the work you think is 
needed.

FWIW, I dug up this old C#/.NET code in which I'd been experimenting with 
similar ideas but it's not smart enough to replace an existing DOCTYPE. I think 
it still works but I'm not sure if any of the APIs it uses have been deprecated.

  https://gist.github.com/1075878


Jon

---
blog: http://jonforums.github.com/
twitter: @jonforums

"Anyone who can only think of one way to spell a word obviously lacks 
imagination." - Mark Twain
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to