On Fri, Jul 27, 2007 at 11:06:36AM +0200, Stefan Behnel wrote: > Hi, > > one of the lxml users noticed that libxml2 changes behaviour when you set the > NONET option for xmlCtxtReadFile() and then call it twice on a network URL. > The first time, it parses the external document. The second time, it refuses > to parse it. > > The problem lies in the handling of the parser options, which are only set > *after* the first call to xmlLoadExternalEntity(), in the following call to > xmlDoRead(). I think this is ok in general as it allows users to parse from a > URL by passing it in but to avoid additional network access when loading > external entities transitively (DTDs etc.) - is this the intended semantics of > the NONET option?
Hum, no. The NONEt semantic is that any access outside the local filesystem should genrate an error. Note that if you have a catalog remapping external resources to local ones, then they should proceed without failure. > Now, the thing is, when you reuse the parser context, then the options *stay* > in the context when you use it the second time, so they will be picked up by > the xmlLoadExternalEntity() call when running xmlCtxtReadFile() a second time. That's weird. > Depending on how contexts are reused in an application, this can lead to > unpredictable behaviour. In lxml, we can work around this by resetting the > context options after parsing, but I would like to see the intended semantics > of the NONET options cleared up and see reliable behaviour here. In general you should always reset the parsing context, like xmlCtxtRead* function do. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
